Re: util-linux: orphan

2006-12-29 Thread Valdis . Kletnieks
On Wed, 27 Dec 2006 23:12:51 +0100, Karel Zak said:
>  For example for my laptop is it true that "life is too short to
>  enable SELinux", but it's probably not true for servers in the bank where
>  I have money. (I hope so:-)

On the other hand, the case can be made that your laptop needs SELinux *more*
than the bank servers - because the bank servers are (presumably) heavily
firewalled and stripped down software-wise, and otherwise hardened.  But
your laptop is exactly one Firefox buffer overflow from being completely 
pwned...


pgpZ02vqGMqxp.pgp
Description: PGP signature


Re: [BUG 2.6.20-rc2] atkbd.c: Spurious ACK

2006-12-29 Thread Rene Herman

Dmitry Torokhov wrote:


Somehow you get 2 ACks in a row, I wonder if on your boxes i8042
pumps command and data into keyboard before i8042_interrupt gets a
chance to run. Could you please apply the debug patch below and tell
me the pattern of the data flow.


Yes, I believe the below trace confirms what you said? Both the ED and 
the 00/05 are sent before the first ACK gets back, by a 1 jiffie margin:


drivers/input/serio/i8042.c: ed -> i8042 (panic blink) [N]
drivers/input/serio/i8042.c: 05 -> i8042 (panic blink) [N + 2]
drivers/input/serio/i8042.c: fa <- i8042 (interrupt, 0, 1) [N + 3]
drivers/input/serio/i8042.c: fa <- i8042 (interrupt, 0, 1) [N + 6]
drivers/input/serio/i8042.c: ed -> i8042 (panic blink) [M]
drivers/input/serio/i8042.c: 00 -> i8042 (panic blink) [M + 2]
drivers/input/serio/i8042.c: fa <- i8042 (interrupt, 0, 1) [M + 3]
drivers/input/serio/i8042.c: fa <- i8042 (interrupt, 0, 1) [M + 6]

The +2, +3 and +6 are constant. Forgot to pay attention to M - N, but I 
suppose it's not too important.


For me, the patch as you posted it is actually good to go. No more 
spurious ACK complaints...


Thanks,
Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19-rc6-mm2 on USB disconnect

2006-12-29 Thread Pete Zaitcev
On Fri, 29 Dec 2006 20:45:57 -0500, Eric Buddington <[EMAIL PROTECTED]> wrote:

> Kernel 2.6.19-rc6-mm2 on an Athlon XP gave me the following Oops when
> unplugging a USB device. I an usually plug and unplug devices without
> trouble, so this is probably not easily repeatable.
> [1742510.173840] PREEMPT 

Yeah, naturally not easy. Thanks anyway... I think I'll take a look.

-- Pete
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.20-rc1 5/6] SA1100 GPIO wrappers

2006-12-29 Thread David Brownell
On Friday 29 December 2006 7:15 pm, Nicolas Pitre wrote:
> On Fri, 29 Dec 2006, David Brownell wrote:
> 
> > Here's a version that compiles ...
> 
> This patch is completely broken.

It's just what Philipp sent, with the "won't compile" bugs fixed.
Oh, and some #include tweaks.  Philipp?


> > Arch-neutral GPIO calls for PXA.
> 
> This is not PXA but SA1100 to start with.

I seem to have copied the wrong header comment, sorry; the original
patch had none.  It's marginally better than the header claiming it
was a PXA header...


> and you most probably need to protect the implied read-modify-write 
> cycle with a spinlock unless the generic gpio API expects this 
> protection is the responsibility of the caller.

No such lock is known to the caller.  Some of those calls will need
to move to a C file somewhere.

- Dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.16.29 1/1] memory: enhance Linux swap subsystem

2006-12-29 Thread Zhou Yingchao

2006/12/27, yunfeng zhang <[EMAIL PROTECTED]>:

To multiple address space, multiple memory inode architecture, we can introduce
a new core object -- section which has several features

Do you mean "in-memory inode"  or "memory node(pglist_data)" by "memory inode" ?

The idea issued by me is whether swap subsystem should be deployed on layer 2 or
layer 3 which is described in Documentation/vm_pps.txt of my patch. To multiple
memory inode architecture, the special memory model should be encapsulated on
layer 3 (architecture-dependent), I think.

I guess that you are  wanting to do something to remove arch-dependent
code in swap subsystem.  Just like the pud introduced in the
page-table related codes. Is it right?
However, you should verify that your changes will not deteriorate
system performance. Also, you need to maintain it for a long time with
the evolution of mainline kernel before it is accepted.

Best regards
--
Yingchao Zhou
***
Institute Of Computing Technology
Chinese Academy of Sciences
***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG 2.6.20-rc2] atkbd.c: Spurious ACK

2006-12-29 Thread Dmitry Torokhov
On Friday 29 December 2006 14:08, Rene Herman wrote:
> Laurent Riffard wrote:
> 
> > Le 29.12.2006 06:54, Rene Herman a écrit :
> 
> >> Not even an analog camera, but with or without the above, I get a single:
> >>
> >> " <7>drivers/input/serio/i8042.c: fa <- i8042 (interrupt, 0, 1) [ 902]"
> 
> ... and when I add "debug" as a kernel param so that I actually get to 
> see them (doh) I get the same as Laurent:
> 
> > 
> > drivers/input/serio/i8042.c: fa <- i8042 (interrupt, 0, 1) [35172]
> > drivers/input/serio/i8042.c: fa <- i8042 (interrupt, 0, 1) [35172]
> > atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying 
> > access hardware directly.
> > drivers/input/serio/i8042.c: fa <- i8042 (interrupt, 0, 1) [35296]
> > drivers/input/serio/i8042.c: fa <- i8042 (interrupt, 0, 1) [35297]
> > atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying 
> > access hardware directly.
> > drivers/input/serio/i8042.c: fa <- i8042 (interrupt, 0, 1) [35420]
> > drivers/input/serio/i8042.c: fa <- i8042 (interrupt, 0, 1) [35421]
> > atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying 
> > access hardware directly.
> > drivers/input/serio/i8042.c: fa <- i8042 (interrupt, 0, 1) [35544]
> > drivers/input/serio/i8042.c: fa <- i8042 (interrupt, 0, 1) [35545]
> > atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying 
> > access hardware directly.
> > ===
> 
> I tried just ignoring more ACKs in i8042_interrupt() but that didn't do 
> anything other than alternating between 2 and 1 i8042.c printks between 
> atkbd.c printks when ignoring an even or oneven number, respectively. I 
> guess it's atkbd.c which needs to ack something to keep it from just 
> being delivered over and over again or something like it?
>

No, atkbd does not need to ACK anything, it is keyboard controller
ACKs commands set to it. Normally there is only one owner of a serio
port and atkbd rightfully complains when it gets ACks from keyboard
controller when it does not expect it. However during panic we cut
in the middle and start sending kommands to the keybaord without atkbd
knowledge. Keyboard ACKs commands we sent to it and these ACKs reach
atkbd causing it to complain.

Somehow you get 2 ACks in a row, I wonder if on your boxes i8042 pumps
command and data into keyboard before i8042_interrupt gets a chance to
run. Could you please apply the debug patch below and tell me the
pattern of the data flow.

Thank you.

-- 
Dmitry

Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]>
---

 drivers/input/serio/i8042.c |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

Index: work/drivers/input/serio/i8042.c
===
--- work.orig/drivers/input/serio/i8042.c
+++ work/drivers/input/serio/i8042.c
@@ -371,7 +371,7 @@ static irqreturn_t i8042_interrupt(int i
if (unlikely(i8042_suppress_kbd_ack))
if (port_no == I8042_KBD_PORT_NO &&
(data == 0xfa || data == 0xfe)) {
-   i8042_suppress_kbd_ack = 0;
+   i8042_suppress_kbd_ack--;
goto out;
}
 
@@ -838,13 +838,14 @@ static long i8042_panic_blink(long count
led ^= 0x01 | 0x04;
while (i8042_read_status() & I8042_STR_IBF)
DELAY;
-   i8042_suppress_kbd_ack = 1;
+   dbg("%02x -> i8042 (panic blink)", 0xed);
+   i8042_suppress_kbd_ack = 2;
i8042_write_data(0xed); /* set leds */
DELAY;
while (i8042_read_status() & I8042_STR_IBF)
DELAY;
DELAY;
-   i8042_suppress_kbd_ack = 1;
+   dbg("%02x -> i8042 (panic blink)", led);
i8042_write_data(led);
DELAY;
last_blink = count;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[KORG] re: [PATCH] cfq-iosched: tighten allow merge criteria - error in final return value ?

2006-12-29 Thread DAVID HORNER
 I noticed a probable error in commit 719d34027e1a186e46a3952e8a24bf91ecc33837

> 
> - if (cfqq != RQ_CFQQ(rq))
> - return 0;
> + 
> + if (cfqq == RQ_CFQQ(rq))
> + return 1;
>
>  return 1;
>  }

  Either the final return value should be 0 (zero) or
   the if statement is redundant.
 
   (I lack the skill to make a timely patch,hope this helps)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Print sysrq-m messages with KERN_INFO priority

2006-12-29 Thread * *

Was this patch tested?

On 12/29/06, Theodore Ts'o <[EMAIL PROTECTED]> wrote:

Print messages resulting from sysrq-m with a KERN_INFO instead of the
default KERN_WARNING priority

Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>

Index: linux-2.6/mm/page_alloc.c
===
--- linux-2.6.orig/mm/page_alloc.c  2006-12-25 09:21:54.0 -0500
+++ linux-2.6/mm/page_alloc.c   2006-12-29 17:19:11.0 -0500
@@ -1505,7 +1505,7 @@
 static inline void show_node(struct zone *zone)
 {
if (NUMA_BUILD)
-   printk("Node %d ", zone_to_nid(zone));
+   printk(KERN_INFO, "Node %d ", zone_to_nid(zone));


Here there is a comma after KERN_INFO, which does not occur elsewhere.


 }

 void si_meminfo(struct sysinfo *val)
@@ -1566,8 +1566,8 @@

pageset = zone_pcp(zone, cpu);

-   printk("CPU %4d: Hot: hi:%5d, btch:%4d usd:%4d   "
-  "Cold: hi:%5d, btch:%4d usd:%4d\n",
+   printk(KERN_INFO "CPU %4d: Hot: hi:%5d, btch:%4d "
+  "usd:%4d   Cold: hi:%5d, btch:%4d usd:%4d\n",
   cpu, pageset->pcp[0].high,
   pageset->pcp[0].batch, pageset->pcp[0].count,
   pageset->pcp[1].high, pageset->pcp[1].batch,
@@ -1577,7 +1577,7 @@

get_zone_counts(&active, &inactive, &free);

-   printk("Active:%lu inactive:%lu dirty:%lu writeback:%lu "
+   printk(KERN_INFO "Active:%lu inactive:%lu dirty:%lu writeback:%lu "
"unstable:%lu free:%u slab:%lu mapped:%lu pagetables:%lu\n",
active,
inactive,
@@ -1619,7 +1619,7 @@
zone->pages_scanned,
(zone->all_unreclaimable ? "yes" : "no")
);
-   printk("lowmem_reserve[]:");
+   printk(KERN_INFO "lowmem_reserve[]:");
for (i = 0; i < MAX_NR_ZONES; i++)
printk(" %lu", zone->lowmem_reserve[i]);
printk("\n");
@@ -1875,7 +1875,7 @@
/* cpuset refresh routine should be here */
}
vm_total_pages = nr_free_pagecache_pages();
-   printk("Built %i zonelists.  Total pages: %ld\n",
+   printk(KERN_INFO "Built %i zonelists.  Total pages: %ld\n",
num_online_nodes(), vm_total_pages);
 }

Index: linux-2.6/mm/swap_state.c
===
--- linux-2.6.orig/mm/swap_state.c  2006-07-04 18:38:19.0 -0400
+++ linux-2.6/mm/swap_state.c   2006-12-29 17:18:42.0 -0500
@@ -57,12 +57,14 @@

 void show_swap_cache_info(void)
 {
-   printk("Swap cache: add %lu, delete %lu, find %lu/%lu, race %lu+%lu\n",
+   printk(KERN_INFO "Swap cache: add %lu, delete %lu, find %lu/%lu, race 
%lu+%lu\n",
swap_cache_info.add_total, swap_cache_info.del_total,
swap_cache_info.find_success, swap_cache_info.find_total,
swap_cache_info.noent_race, swap_cache_info.exist_race);
-   printk("Free swap  = %lukB\n", nr_swap_pages << (PAGE_SHIFT - 10));
-   printk("Total swap = %lukB\n", total_swap_pages << (PAGE_SHIFT - 10));
+   printk(KERN_INFO "Free swap  = %lukB\n",
+  nr_swap_pages << (PAGE_SHIFT - 10));
+   printk(KERN_INFO "Total swap = %lukB\n",
+  total_swap_pages << (PAGE_SHIFT - 10));
 }

 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH -mm take2 3/5] add interface for netconsole using sysfs

2006-12-29 Thread Stephen Hemminger
Keiichi KII wrote:
> From: Keiichi KII <[EMAIL PROTECTED]>
>
> This patch contains the following changes.
>
> create a sysfs entry for netconsole in /sys/class/misc.
> This entry has elements related to netconsole as follows.
> You can change configuration of netconsole(writable attributes such as IP
> address, port number and so on) and check current configuration of netconsole.
>
> -+- /sys/class/misc/
>  |-+- netconsole/
>|-+- port1/
>| |--- id  [r--r--r--]  unique port id
>| |--- remove  [-w---]  if you write something to "remove",
>| | this port is removed.
>   
IMHO this kind of "magic side effect" is a misuse of sysfs. and would
make proper locking
impossible. How do you deal with the dangling reference to the
netconsole object?
f= open (... netconsole/port1/remove")
write(f, "", 1)
sleep(2)
write(f, "", 1)  this probably would crash...


Maybe having a state variable/sysfs file so you could setup the port and
turn it on/off with write.
>| |--- dev_name[r--r--r--]  network interface name
>   

Please don't use dev_name, instead use a a symlink. You see if the
device is renamed,
the dev_name will be wrong, but the symlink to the net_device kobject
should be okay.
>| |--- local_ip[rw-r--r--]  source IP to use, writable
>| |--- local_port  [rw-r--r--]  source port number for UDP packets, 
> writable
>| |--- local_mac   [r--r--r--]  source MAC address
>| |--- remote_ip   [rw-r--r--]  port number for logging agent, writable
>| |--- remote_port [rw-r--r--]  IP address for logging agent, writable
>|  remote_mac  [rw-r--r--]  MAC address for logging agent, writable
>|--- port2/
>|--- port3/
>...
>
> Signed-off-by: Keiichi KII <[EMAIL PROTECTED]>
> Signed-off-by: Takayoshi Kochi <[EMAIL PROTECTED]>
>   

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Print sysrq-m messages with KERN_INFO priority

2006-12-29 Thread Dave Jones
On Fri, Dec 29, 2006 at 08:42:47PM -0800, Andrew Morton wrote:
 > On Fri, 29 Dec 2006 22:24:53 -0500
 > "Theodore Ts'o" <[EMAIL PROTECTED]> wrote:
 > 
 > > Print messages resulting from sysrq-m with a KERN_INFO instead of the
 > > default KERN_WARNING priority
 > 
 > hm, I wonder why.  If someone does sysrq- then they presumably want
 > to display the result?  Tricky.

I looked at this and got even more puzzled.
__handle_sysrq temporarily sets the loglevel to 7 (KERN_DEBUG) for the
duration of the sysrq- output.

Which is odd, as KERN_DEBUG stuff is usually hidden, yet the
printk's that lack loglevels still seem to end up onscreen.

Ted's patch also misses a few of the printk's in show_free_areas()
which seems inconsistent, or am I just confused?

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Print sysrq-m messages with KERN_INFO priority

2006-12-29 Thread Andrew Morton
On Fri, 29 Dec 2006 22:24:53 -0500
"Theodore Ts'o" <[EMAIL PROTECTED]> wrote:

> Print messages resulting from sysrq-m with a KERN_INFO instead of the
> default KERN_WARNING priority

hm, I wonder why.  If someone does sysrq- then they presumably want
to display the result?  Tricky.

Is this patch a consistency thing?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Display class

2006-12-29 Thread Dmitry Torokhov
Hi,

On Tuesday 05 December 2006 13:03, James Simmons wrote:
> +int probe_edid(struct display_device *dev, void *data)
> +{
> +   struct fb_monspecs spec;
> +   ssize_t size = 45;

const ssize_t size = 45? 

> +
> +   dev->name = kzalloc(size, GFP_KERNEL);

Why do you need kzalloc here?

> +   fb_edid_to_monspecs((unsigned char *) data, &spec);
> +   strcpy(dev->name, spec.manufacturer);

You seem to be overwriting dev->name in the very next line?

> +   return snprintf(dev->name, size, "%s %s %s\n", spec.manufacturer, 
> spec.monitor, spec.ascii);
> 

Is result of snprintf interesting to the callers?

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Print sysrq-m messages with KERN_INFO priority

2006-12-29 Thread Theodore Ts'o
Print messages resulting from sysrq-m with a KERN_INFO instead of the
default KERN_WARNING priority

Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>

Index: linux-2.6/mm/page_alloc.c
===
--- linux-2.6.orig/mm/page_alloc.c  2006-12-25 09:21:54.0 -0500
+++ linux-2.6/mm/page_alloc.c   2006-12-29 17:19:11.0 -0500
@@ -1505,7 +1505,7 @@
 static inline void show_node(struct zone *zone)
 {
if (NUMA_BUILD)
-   printk("Node %d ", zone_to_nid(zone));
+   printk(KERN_INFO, "Node %d ", zone_to_nid(zone));
 }
 
 void si_meminfo(struct sysinfo *val)
@@ -1566,8 +1566,8 @@
 
pageset = zone_pcp(zone, cpu);
 
-   printk("CPU %4d: Hot: hi:%5d, btch:%4d usd:%4d   "
-  "Cold: hi:%5d, btch:%4d usd:%4d\n",
+   printk(KERN_INFO "CPU %4d: Hot: hi:%5d, btch:%4d "
+  "usd:%4d   Cold: hi:%5d, btch:%4d usd:%4d\n",
   cpu, pageset->pcp[0].high,
   pageset->pcp[0].batch, pageset->pcp[0].count,
   pageset->pcp[1].high, pageset->pcp[1].batch,
@@ -1577,7 +1577,7 @@
 
get_zone_counts(&active, &inactive, &free);
 
-   printk("Active:%lu inactive:%lu dirty:%lu writeback:%lu "
+   printk(KERN_INFO "Active:%lu inactive:%lu dirty:%lu writeback:%lu "
"unstable:%lu free:%u slab:%lu mapped:%lu pagetables:%lu\n",
active,
inactive,
@@ -1619,7 +1619,7 @@
zone->pages_scanned,
(zone->all_unreclaimable ? "yes" : "no")
);
-   printk("lowmem_reserve[]:");
+   printk(KERN_INFO "lowmem_reserve[]:");
for (i = 0; i < MAX_NR_ZONES; i++)
printk(" %lu", zone->lowmem_reserve[i]);
printk("\n");
@@ -1875,7 +1875,7 @@
/* cpuset refresh routine should be here */
}
vm_total_pages = nr_free_pagecache_pages();
-   printk("Built %i zonelists.  Total pages: %ld\n",
+   printk(KERN_INFO "Built %i zonelists.  Total pages: %ld\n",
num_online_nodes(), vm_total_pages);
 }
 
Index: linux-2.6/mm/swap_state.c
===
--- linux-2.6.orig/mm/swap_state.c  2006-07-04 18:38:19.0 -0400
+++ linux-2.6/mm/swap_state.c   2006-12-29 17:18:42.0 -0500
@@ -57,12 +57,14 @@
 
 void show_swap_cache_info(void)
 {
-   printk("Swap cache: add %lu, delete %lu, find %lu/%lu, race %lu+%lu\n",
+   printk(KERN_INFO "Swap cache: add %lu, delete %lu, find %lu/%lu, race 
%lu+%lu\n",
swap_cache_info.add_total, swap_cache_info.del_total,
swap_cache_info.find_success, swap_cache_info.find_total,
swap_cache_info.noent_race, swap_cache_info.exist_race);
-   printk("Free swap  = %lukB\n", nr_swap_pages << (PAGE_SHIFT - 10));
-   printk("Total swap = %lukB\n", total_swap_pages << (PAGE_SHIFT - 10));
+   printk(KERN_INFO "Free swap  = %lukB\n",
+  nr_swap_pages << (PAGE_SHIFT - 10));
+   printk(KERN_INFO "Total swap = %lukB\n",
+  total_swap_pages << (PAGE_SHIFT - 10));
 }
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] aio: make aio_ring_info->nr_pages an unsigned int

2006-12-29 Thread Chen, Kenneth W
The number of io_event in AIO event queue allowed currently is no more
than 2^32-1, because the syscall defines:

asmlinkage long sys_io_setup(unsigned nr_events, aio_context_t __user 
*ctxp)

We internally allocate a ring buffer for nr_events and keeps tracks of
page descriptors for each of these ring buffer pages.  Since page size
is significantly larger than AIO event size (4096 versus 32), I don't
think it is ever possible to overflow nr_pages in 32-bit quantity.

This patch changes nr_pages to unsigned int. on 64-bit arch, changing
it to unsigned int also allows better packing of aio_ring_info structure.


Signed-off-by: Ken Chen <[EMAIL PROTECTED]>

--- ./include/linux/aio.h.orig  2006-12-24 22:31:55.0 -0800
+++ ./include/linux/aio.h   2006-12-24 22:41:28.0 -0800
@@ -165,7 +165,7 @@ struct aio_ring_info {
 
struct page **ring_pages;
spinlock_t  ring_lock;
-   longnr_pages;
+   unsignednr_pages;
 
unsignednr, tail;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.20-rc1 5/6] SA1100 GPIO wrappers

2006-12-29 Thread Nicolas Pitre
On Fri, 29 Dec 2006, David Brownell wrote:

> Here's a version that compiles ...

This patch is completely broken.

> Arch-neutral GPIO calls for PXA.

This is not PXA but SA1100 to start with.

> Signed-off-by: David Brownell <[EMAIL PROTECTED]>
> 
> Index: pxa/include/asm-arm/arch-sa1100/gpio.h
> ===
> --- /dev/null 1970-01-01 00:00:00.0 +
> +++ pxa/include/asm-arm/arch-sa1100/gpio.h2006-12-29 
> 18:21:00.0 -0800
> @@ -0,0 +1,100 @@

[...]

> +static inline int gpio_direction_input(unsigned gpio)
> +{
> + if (gpio > GPIO_MAX)
> + return -EINVAL;
> + GPDR = (GPDR_In << gpio);

This is crap.  It will expand to GPDR = 0 effectively making _all_ gpios 
as input.

What you want here is:

GPDR &= ~(1 << gpio);

and you most probably need to protect the implied read-modify-write 
cycle with a spinlock unless the generic gpio API expects this 
protection is the responsibility of the caller.

> +static inline int gpio_direction_output(unsigned gpio)
> +{
> + if (gpio > GPIO_MAX)
> + return -EINVAL;
> + GPDR = (GPDR_Out << gpio);

Same issue, although this would make all gpios as input except for the 
specified one.

What you want is:

GPDR |= (1 << gpio);

And again spinlock protection is probably needed.

> +static inline int __gpio_get_value(unsigned gpio)
> +{
> + return GPLR & GPIO_GPIO(gpio);
> +}
> +
> +#define gpio_get_value(gpio) \
> + (__builtin_constant_p(gpio) \
> + ? __gpio_get_value(gpio)\
> + : sa1100_gpio_get_value(gpio))
> +

Please drop the out of line version.  It will always be more costly than 
the inline version even for non constant gpio values.  And I think the 
usage of GPIO_GPIO(gpio) is more obfuscating than directly using
(1 << gpio).

> +static inline void __gpio_set_value(unsigned gpio, int value)
> +{
> + if (value)
> + GPSR = GPIO_GPIO(gpio);
> + else
> + GPCR = GPIO_GPIO(gpio);
> +}
> +
> +#define gpio_set_value(gpio,value)   \
> + (__builtin_constant_p(gpio) \
> + ? __gpio_set_value(gpio, value) \
> + : sa1100_gpio_set_value(gpio, value))

Same as above.


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG KERNEL 2.6.20-rc1] ftp: get or put stops during file-transfer

2006-12-29 Thread Komuro

> 
> > I investigated the ftp-file-transfer-stop problem by git-bisect method,
> > and found this problem was introduced by
> > "[TCP]: MD5 Signature Option (RFC2385) support" patch.
> > 
> > Mr.YOSHIFUJI san, please fix this problem.
> 
> Hmm, have you try disabling CONFIG_TCP_MD5SIG?
> (Is it already disabled?)

This problem happens both CONFIG_TCP_MD5SIG is disabled and enabled.

> Are there any specific size of transfer to reproduce this?

When I do ftp 40Mbytes file for 5-times or more,
 this problem happens.


> Do you see similar issue with other simple application?

sorry, I don't reproduce this problem on other application.

Thanks,

Best Regards
Komuro.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] aio: remove spurious ring head index modulo info->nr

2006-12-29 Thread Chen, Kenneth W
In aio_read_evt(), the ring->head will never wrap info->nr because
we already does the wrap when updating the ring head index:

if (head != ring->tail) {
...
head = (head + 1) % info->nr;
ring->head = head;
}

This makes the modulo of ring->head into local variable head unnecessary.
This patch removes that bogus code.


Signed-off-by: Ken Chen <[EMAIL PROTECTED]>


--- ./fs/aio.c.orig 2006-12-24 22:01:36.0 -0800
+++ ./fs/aio.c  2006-12-24 22:34:48.0 -0800
@@ -1019,7 +1019,7 @@ static int aio_read_evt(struct kioctx *i
 {
struct aio_ring_info *info = &ioctx->ring_info;
struct aio_ring *ring;
-   unsigned long head;
+   unsigned int head;
int ret = 0;
 
ring = kmap_atomic(info->ring_pages[0], KM_USER0);
@@ -1032,7 +1032,7 @@ static int aio_read_evt(struct kioctx *i
 
spin_lock(&info->ring_lock);
 
-   head = ring->head % info->nr;
+   head = ring->head;
if (head != ring->tail) {
struct io_event *evp = aio_ring_event(info, head, KM_USER1);
*ent = *evp;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: potential for buffer_head shrinkage.

2006-12-29 Thread Dave Jones
On Fri, Dec 29, 2006 at 09:45:54PM -0500, Dave Jones wrote:
 > Looking at struct buffer_head, it seems that b_state
 > uses at most 15 bits, where it's defined as a 64bit entity
 > due to it being used by bit_spin_lock and friends.
 > 
 > Given it's not uncommon for a few hundred thousand of these
 > to be present, I wonder if it's worth the effort of folding
 > b_count into the upper bits of b_state, thus shrinking
 > buffer_head by 16 bits?  This would still leave 32 bits
 > 'wasted' for further bh_state_bits expansion if necessary.

My math here based on a 64 bit compile btw in case that wasn't obvious.
32 bit wouldn't leave room for expansion.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] aio: streamline read events after woken up

2006-12-29 Thread Chen, Kenneth W
The read event loop in the blocking path is also inefficient. For every
event it reap (if not blocking), it does the following in a loop:

  while (i < nr) {
prepare_to_wait_exclusive
aio_read_evt
finish_wait
...
  }

Given the previous patch "aio: add per task aio wait event condition"
that we properly wake up event waiting process knowing that we have
enough events to reap, it's just plain waste of time to insert itself
into a wait queue, and then immediately remove itself from the wait
queue for *every* event reap iteration.

This patch factors out the wait queue insertion/deletion out of the event
reap loop, streamlines the event reaping after the process wakes up.


Signed-off-by: Ken Chen <[EMAIL PROTECTED]>

--- ./fs/aio.c.orig 2006-12-24 17:04:52.0 -0800
+++ ./fs/aio.c  2006-12-24 17:05:10.0 -0800
@@ -1174,42 +1174,40 @@ retry:
}
 
aio_init_wait(&wait);
+wait:
+   prepare_to_wait_exclusive(&ctx->wait, &wait.wait, TASK_INTERRUPTIBLE);
+   ret = aio_read_evt(ctx, &ent);
+   if (!ret) {
+   wait.nr_wait = min_nr - i;
+   schedule();
+   if (signal_pending(tsk))
+   ret = -EINTR;
+   }
+   finish_wait(&ctx->wait, &wait.wait);
+
+   if (ret < 0)
+   goto out_cleanup;
+
while (likely(i < nr)) {
-   do {
-   prepare_to_wait_exclusive(&ctx->wait, &wait.wait,
- TASK_INTERRUPTIBLE);
-   ret = aio_read_evt(ctx, &ent);
-   if (ret)
-   break;
-   if (min_nr <= i)
-   break;
-   ret = 0;
-   if (to.timed_out)   /* Only check after read evt */
-   break;
-   wait.nr_wait = min_nr - i;
-   schedule();
-   if (signal_pending(tsk)) {
-   ret = -EINTR;
+   if (ret) {
+   if (unlikely(copy_to_user(event, &ent, sizeof(ent {
+   dprintk("aio: lost an event due to EFAULT.\n");
+   ret = -EFAULT;
break;
}
-   /*ret = aio_read_evt(ctx, &ent);*/
-   } while (1) ;
-   finish_wait(&ctx->wait, &wait.wait);
-
-   if (unlikely(ret <= 0))
-   break;
+   event++;
+   i++;
+   }
 
-   ret = -EFAULT;
-   if (unlikely(copy_to_user(event, &ent, sizeof(ent {
-   dprintk("aio: lost an event due to EFAULT.\n");
+   ret = aio_read_evt(ctx, &ent);
+   if (unlikely(!ret)) {
+   if (i < min_nr && !to.timed_out)
+   goto wait;
break;
}
-
-   /* Good, event copied to userland, update counts. */
-   event ++;
-   i ++;
}
 
+out_cleanup:
if (timeout)
clear_timeout(&to);
 out:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


potential for buffer_head shrinkage.

2006-12-29 Thread Dave Jones
Looking at struct buffer_head, it seems that b_state
uses at most 15 bits, where it's defined as a 64bit entity
due to it being used by bit_spin_lock and friends.

Given it's not uncommon for a few hundred thousand of these
to be present, I wonder if it's worth the effort of folding
b_count into the upper bits of b_state, thus shrinking
buffer_head by 16 bits?  This would still leave 32 bits
'wasted' for further bh_state_bits expansion if necessary.

Opinions?

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.20-rc1 4/6] PXA GPIO wrappers

2006-12-29 Thread David Brownell
On Friday 29 December 2006 6:15 pm, Nicolas Pitre wrote:
> On Thu, 28 Dec 2006, David Brownell wrote:
> 
> > Phillip:  is this the final version, then?  It's missing
> > a signed-off-by line, so I can't do anything appropriate.
> > 
> > Nico, your signoff here would be a Good Thing too if it
> > meets your technical review.  (My only comment, ISTR, was
> > that gpio_set_value macro should probably test for whether
> > the value is a constant too, not just the gpio pin.)
> 
> I don't think so.  Expansion of GPIO_bit(x) is pretty simple even if x 
> is not constant.  That probably makes it still less costly than a 
> function call.

I was more concerned with the "value" ... that expands to a conditional,
which is likely to cost a couple more instructions regardless, which in
space terms competes with a function call.

But I concluded much the same thing when I did that experimental
conversion ... not because I was comparing the space for conditional
vs funcall, but because the existing code already had the conditional.
If more code savings can be had later, so be it ... no rush for now.

- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] aio: add per task aio wait event condition

2006-12-29 Thread Chen, Kenneth W
The AIO wake-up notification from aio_complete is really inefficient
in current AIO implementation in the presence of process waiting in
io_getevents().

For example, if app calls io_getevents with min_nr > 1, and aio event
queue doesn't have enough completed aio event, the process will block
in read_events().  However, aio_complete() will wake up the waiting
process for *each* complete I/O even though number of events that an
app is waiting for is much larger than 1.  This makes excessive and
unnecessary context switch because the waiting process will just reap
one single event and goes back to sleep again.  It is much more efficient
to wake up the waiting process when there are enough events for it to
reap.

This patch adds a wait condition to the wait queue and only wake-up
process when that condition meets.  And this condition is added on a
per task base for handling multi-threaded app that shares single ioctx.

To show the effect of this patch, here is an vmstat output before and
after the patch. The app does random O_DIRECT AIO on 60 disks. Context
switch is reduced from 13 thousand+ down to just 40+, an significant
improvement.

Before:
procs ---memory-- ---swap-- -io --system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us sy id wa
 0  0  0 3972608   7056  3131200 14000 0 7840 13715  0  2 98  0
 0  0  0 3972608   7056  3131200 14300 0 7793 13641  0  2 98  0
 0  0  0 3972608   7056  3131200 14100 0 7885 13747  0  2 98  0

After:
 0  0  0 3972608   7056  3131200 14000 0 784049  0  2 98  0
 0  0  0 3972608   7056  3131200 13800 0 779353  0  2 98  0
 0  0  0 3972608   7056  3131200 13800 0 788542  0  2 98  0


Signed-off-by: Ken Chen <[EMAIL PROTECTED]>


--- ./fs/aio.c.orig 2006-12-24 16:41:39.0 -0800
+++ ./fs/aio.c  2006-12-24 16:52:15.0 -0800
@@ -193,6 +193,17 @@ static int aio_setup_ring(struct kioctx 
kunmap_atomic((void *)((unsigned long)__event & PAGE_MASK), km); \
 } while(0)
 
+struct aio_wait_queue {
+   int nr_wait;/* wake-up condition */
+   wait_queue_twait;
+};
+
+static inline void aio_init_wait(struct aio_wait_queue *wait)
+{
+   wait->nr_wait = 0;
+   init_wait(&wait->wait);
+}
+
 /* ioctx_alloc
  * Allocates and initializes an ioctx.  Returns an ERR_PTR if it failed.
  */
@@ -296,13 +307,14 @@ static void aio_cancel_all(struct kioctx
 static void wait_for_all_aios(struct kioctx *ctx)
 {
struct task_struct *tsk = current;
-   DECLARE_WAITQUEUE(wait, tsk);
+   struct aio_wait_queue wait;
 
spin_lock_irq(&ctx->ctx_lock);
if (!ctx->reqs_active)
goto out;
 
-   add_wait_queue(&ctx->wait, &wait);
+   aio_init_wait(&wait);
+   add_wait_queue(&ctx->wait, &wait.wait);
set_task_state(tsk, TASK_UNINTERRUPTIBLE);
while (ctx->reqs_active) {
spin_unlock_irq(&ctx->ctx_lock);
@@ -311,7 +323,7 @@ static void wait_for_all_aios(struct kio
spin_lock_irq(&ctx->ctx_lock);
}
__set_task_state(tsk, TASK_RUNNING);
-   remove_wait_queue(&ctx->wait, &wait);
+   remove_wait_queue(&ctx->wait, &wait.wait);
 
 out:
spin_unlock_irq(&ctx->ctx_lock);
@@ -932,6 +944,7 @@ int fastcall aio_complete(struct kiocb *
unsigned long   flags;
unsigned long   tail;
int ret;
+   int nr_evt = 0;
 
/*
 * Special case handling for sync iocbs:
@@ -992,6 +1005,9 @@ int fastcall aio_complete(struct kiocb *
info->tail = tail;
ring->tail = tail;
 
+   nr_evt = ring->tail - ring->head;
+   if (nr_evt < 0)
+   nr_evt += info->nr;
put_aio_ring_event(event, KM_IRQ0);
kunmap_atomic(ring, KM_IRQ1);
 
@@ -1000,8 +1016,13 @@ put_rq:
/* everything turned out well, dispose of the aiocb. */
ret = __aio_put_req(ctx, iocb);
 
-   if (waitqueue_active(&ctx->wait))
-   wake_up(&ctx->wait);
+   if (waitqueue_active(&ctx->wait)) {
+   struct aio_wait_queue *wait;
+   wait = container_of(ctx->wait.task_list.next,
+   struct aio_wait_queue, wait.task_list);
+   if (nr_evt >= wait->nr_wait)
+   wake_up(&ctx->wait);
+   }
 
spin_unlock_irqrestore(&ctx->ctx_lock, flags);
return ret;
@@ -1094,7 +1115,7 @@ static int read_events(struct kioctx *ct
 {
longstart_jiffies = jiffies;
struct task_struct  *tsk = current;
-   DECLARE_WAITQUEUE(wait, tsk);
+   struct aio_wait_queue   wait;
int ret;
int i = 0;
struct io_event ent;
@@ -1152,10 +1173,11 @@ retry:
set_timeout(start_jiffies, &to, &ts);
}

Re: [patch 2.6.20-rc1 5/6] SA1100 GPIO wrappers

2006-12-29 Thread David Brownell
Here's a version that compiles ...
From: Philipp Zabel <[EMAIL PROTECTED]>

Arch-neutral GPIO calls for PXA.

Signed-off-by: David Brownell <[EMAIL PROTECTED]>

Index: pxa/include/asm-arm/arch-sa1100/gpio.h
===
--- /dev/null	1970-01-01 00:00:00.0 +
+++ pxa/include/asm-arm/arch-sa1100/gpio.h	2006-12-29 18:21:00.0 -0800
@@ -0,0 +1,100 @@
+/*
+ * linux/include/asm-arm/arch-sa1100/gpio.h
+ *
+ * SA1100 GPIO wrappers for arch-neutral GPIO calls
+ *
+ * Written by Philipp Zabel <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ */
+
+#ifndef __ASM_ARCH_SA1100_GPIO_H
+#define __ASM_ARCH_SA1100_GPIO_H
+
+#include 
+#include 
+
+#include 
+
+static inline int gpio_request(unsigned gpio, const char *label)
+{
+	return 0;
+}
+
+static inline void gpio_free(unsigned gpio)
+{
+	return;
+}
+
+static inline int gpio_direction_input(unsigned gpio)
+{
+	if (gpio > GPIO_MAX)
+		return -EINVAL;
+	GPDR = (GPDR_In << gpio);
+	return 0;
+}
+
+static inline int gpio_direction_output(unsigned gpio)
+{
+	if (gpio > GPIO_MAX)
+		return -EINVAL;
+	GPDR = (GPDR_Out << gpio);
+	return 0;
+}
+
+extern int sa1100_gpio_get_value(unsigned gpio);
+extern void sa1100_gpio_set_value(unsigned gpio, int value);
+
+static inline int __gpio_get_value(unsigned gpio)
+{
+	return GPLR & GPIO_GPIO(gpio);
+}
+
+#define gpio_get_value(gpio)			\
+	(__builtin_constant_p(gpio)		\
+	? __gpio_get_value(gpio)		\
+	: sa1100_gpio_get_value(gpio))
+
+static inline void __gpio_set_value(unsigned gpio, int value)
+{
+	if (value)
+		GPSR = GPIO_GPIO(gpio);
+	else
+		GPCR = GPIO_GPIO(gpio);
+}
+
+#define gpio_set_value(gpio,value)		\
+	(__builtin_constant_p(gpio)		\
+	? __gpio_set_value(gpio, value)		\
+	: sa1100_gpio_set_value(gpio, value))
+
+static inline unsigned gpio_to_irq(unsigned gpio)
+{
+	if (gpio < 11)
+		return IRQ_GPIO0 + gpio;
+	else
+		return IRQ_GPIO11 - 11 + gpio;
+}
+
+static inline unsigned irq_to_gpio(unsigned irq)
+{
+	if (irq < IRQ_GPIO11_27)
+		return irq - IRQ_GPIO0;
+	else
+		return irq - IRQ_GPIO11 + 11;
+}
+
+#endif
Index: pxa/arch/arm/mach-sa1100/generic.c
===
--- pxa.orig/arch/arm/mach-sa1100/generic.c	2006-12-10 01:30:42.0 -0800
+++ pxa/arch/arm/mach-sa1100/generic.c	2006-12-29 17:46:47.0 -0800
@@ -28,6 +28,8 @@
 #include 
 #include 
 
+#include 
+
 #include "generic.h"
 
 #define NR_FREQS	16
@@ -139,6 +141,26 @@ unsigned long long sched_clock(void)
 }
 
 /*
+ * Return GPIO level
+ */
+int sa1100_gpio_get_value(unsigned gpio)
+{
+	return __gpio_get_value(gpio);
+}
+
+EXPORT_SYMBOL(sa1100_gpio_get_value);
+
+/*
+ * Set output GPIO level
+ */
+void sa1100_gpio_set_value(unsigned gpio, int value)
+{
+	__gpio_set_value(gpio, value);
+}
+
+EXPORT_SYMBOL(sa1100_gpio_set_value);
+
+/*
  * Default power-off for SA1100
  */
 static void sa1100_power_off(void)


Re: [patch 2.6.20-rc1 4/6] PXA GPIO wrappers

2006-12-29 Thread Nicolas Pitre
On Thu, 28 Dec 2006, David Brownell wrote:

> Phillip:  is this the final version, then?  It's missing
> a signed-off-by line, so I can't do anything appropriate.
> 
> Nico, your signoff here would be a Good Thing too if it
> meets your technical review.  (My only comment, ISTR, was
> that gpio_set_value macro should probably test for whether
> the value is a constant too, not just the gpio pin.)

I don't think so.  Expansion of GPIO_bit(x) is pretty simple even if x 
is not constant.  That probably makes it still less costly than a 
function call.


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OpenAFS gatekeepers request addition of AFS_SUPER_MAGIC to magic.h

2006-12-29 Thread Adam Megacz

Drat.  Diffed in the wrong direction.  Yes, you're right.

  - a

Stephen Frost <[EMAIL PROTECTED]> writes:
> * Adam Megacz ([EMAIL PROTECTED]) wrote:
>> --- include/linux/magic.h   2006-12-29 15:48:50.0 -0800
>> +++ include/linux/magic.h   2006-11-29 13:57:37.0 -0800
>> @@ -3,7 +3,6 @@
>>  
>>  #define ADFS_SUPER_MAGIC   0xadf5
>>  #define AFFS_SUPER_MAGIC   0xadff
>> -#define AFS_SUPER_MAGIC0x5346414F
>>  #define AUTOFS_SUPER_MAGIC 0x0187
>>  #define CODA_SUPER_MAGIC   0x73757245
>>  #define EFS_SUPER_MAGIC0x414A53
>
> Wouldn't you want a patch which *adds* it, rather than one which
> *removes* it...?
>
>   Thanks,
>
>   Stephen

-- 
PGP/GPG: 5C9F F366 C9CF 2145 E770  B1B8 EFB1 462D A146 C380

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc2: known unfixed regressions

2006-12-29 Thread Adrian Bunk
On Fri, Dec 29, 2006 at 10:21:36PM -0300, Horst H. von Brand wrote:
> Adrian Bunk <[EMAIL PROTECTED]> wrote:
> 
> [...]
> 
> > Subject: BUG at fs/buffer.c:1235 when using gdb
> > References : http://lkml.org/lkml/2006/12/17/134
> > Submitter  : Andrew J. Barr <[EMAIL PROTECTED]>
> > Fixed-By   : Jeremy Fitzhardinge <[EMAIL PROTECTED]>
> > Commit : 8701ea957dd2a7c309e17c8dcde3a64b92d8aec0
> > Status : fixed in -rc2
> 
> This I see in Fedora rawhide i686 2.6.19-1.2891.fc7 (BZ'd at
> 

2.6.19-1.2891.fc7 is based on 2.6.20-rc1-git5, and it's therefore 
expected that it contains this bug.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG KERNEL 2.6.20-rc1] ftp: get or put stops during file-transfer

2006-12-29 Thread YOSHIFUJI Hideaki / 吉藤英明
In article <[EMAIL PROTECTED]> (at Sat, 30 Dec 2006 18:50:43 +0900), Komuro 
<[EMAIL PROTECTED]> says:

> I investigated the ftp-file-transfer-stop problem by git-bisect method,
> and found this problem was introduced by
> "[TCP]: MD5 Signature Option (RFC2385) support" patch.
> 
> Mr.YOSHIFUJI san, please fix this problem.

Hmm, have you try disabling CONFIG_TCP_MD5SIG?
(Is it already disabled?)

Are there any specific size of transfer to reproduce this?
Do you see similar issue with other simple application?

--yoshfuji
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Oops in 2.6.19-rc6-mm2 on USB disconnect

2006-12-29 Thread Eric Buddington
Kernel 2.6.19-rc6-mm2 on an Athlon XP gave me the following Oops when
unplugging a USB device. I an usually plug and unplug devices without
trouble, so this is probably not easily repeatable.

-Eric

--

[1742504.966893] usb 1-6.4.3: USB disconnect, address 16
[1742510.144326] usb 1-6.4.2: USB disconnect, address 15
[1742510.173818] BUG: unable to handle kernel NULL pointer dereference at virtua
l address 000c
[1742510.173827]  printing eip:
[1742510.173829] c03eeb2a
[1742510.173832] *pde = 
[1742510.173838] Oops:  [#1]
[1742510.173840] PREEMPT 
[1742510.173844] last sysfs file: /devices/pci:00/:00:03.2/usb1/1-6/1-6.
4/1-6.4.2/product
[1742510.173848] Modules linked in: fuse ppp_synctty ppp_async crc_ccitt ppp_gen
eric slhc r128 drm softdog capability commoncap sch_tbf raw1394 dv1394 ohci1394 
ieee1394 snd_ice1712 snd_ice17xx_ak4xxx keyspan_pda snd_ak4xxx_adda snd_cs8427 s
nd_i2c snd_mpu401_uart snd_rawmidi usbserial usbhid ff_memless sg usb_storage us
bnet ohci_hcd uhci_hcd ehci_hcd snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_seq
_oss snd_seq_device snd_seq_midi_event snd_pcm_oss snd_pcm snd_page_alloc snd_se
q snd_mixer_oss snd_timer snd soundcore ipt_tos ipt_owner iptable_nat ipt_MASQUE
RADE ip_nat joydev ip_conntrack tsdev nfnetlink ipt_TOS iptable_filter iptable_m
angle ip_tables x_tables 8139too sis900 mii usbmouse usbcore psmouse sis5595 hwm
on i2c_isa i2c_core sis_agp agpgart ide_scsi
[1742510.173914] CPU:0
[1742510.173915] EIP:0060:[]Tainted: G   M  VLI
[1742510.173917] EFLAGS: 00010286   (2.6.19-rc6-mm2 #1)
[1742510.173931] EIP is at klist_del+0x9/0x49
[1742510.173935] eax:    ebx: e887a400   ecx: e887a5f8   edx: 0063
[1742510.173940] esi: e887a4a0   edi: c4e17214   ebp: f5e59e30   esp: f5e59e28
[1742510.173943] ds: 007b   es: 007b   ss: 0068
[1742510.173948] Process khubd (pid: 961, ti=f5e58000 task=f5dd7220 task.ti=f5e5
8000)
[1742510.173951] Stack: e887a400 e887a490 f5e59e44 c02d44ec e887a400 e887a490 f8
b03620 f5e59e54 
[1742510.173959]c030501d e887a400 d7015400 f5e59e64 c03047b4 0246 d7
015400 f5e59e74 
[1742510.173967]c02fdac6 d70156bc f63bde18 f5e59e84 f8af27ff  d7
0156bc f5e59e90 
[1742510.173975] Call Trace:
[1742510.173993]  [] device_del+0x17/0x161
[1742510.174009]  [] __scsi_remove_device+0x34/0x62
[1742510.174021]  [] scsi_forget_host+0x5e/0x9a
[1742510.174029]  [] scsi_remove_host+0xad/0x14b
[1742510.174041]  [] quiesce_and_remove_host+0xcf/0xd3 [usb_storage]
[1742510.174089]  [] storage_disconnect+0x11/0x1b [usb_storage]
[1742510.174106]  [] usb_unbind_interface+0x41/0x81 [usbcore]
[1742510.174175]  [] __device_release_driver+0x71/0x86
[1742510.174183]  [] device_release_driver+0x2f/0x45
[1742510.174190]  [] bus_remove_device+0x5e/0x6c
[1742510.174196]  [] device_del+0x10b/0x161
[1742510.174203]  [] usb_disable_device+0x5f/0xbc [usbcore]
[1742510.174233]  [] usb_disconnect+0x94/0x134 [usbcore]
[1742510.174256]  [] hub_thread+0x38c/0xa8c [usbcore]
[1742510.174280]  [] kthread+0xa3/0xcf
[1742510.174294]  [] kernel_thread_helper+0x7/0x10
[1742510.174305] DWARF2 unwinder stuck at kernel_thread_helper+0x7/0x10
[1742510.174311] Leftover inexact backtrace:
[1742510.174314]  [] show_trace_log_lvl+0x1a/0x2f
[1742510.174319]  [] show_stack_log_lvl+0x9b/0xa3
[1742510.174323]  [] show_registers+0x19d/0x2b8
[1742510.174328]  [] die+0x127/0x202
[1742510.174332]  [] do_page_fault+0x430/0x4fd
[1742510.174339]  [] error_code+0x74/0x7c
[1742510.174346]  [] device_del+0x17/0x161
[1742510.174351]  [] __scsi_remove_device+0x34/0x62
[1742510.174356]  [] scsi_forget_host+0x5e/0x9a
[1742510.174360]  [] scsi_remove_host+0xad/0x14b
[1742510.174365]  [] quiesce_and_remove_host+0xcf/0xd3 [usb_storage]
[1742510.174378]  [] storage_disconnect+0x11/0x1b [usb_storage]
[1742510.174389]  [] usb_unbind_interface+0x41/0x81 [usbcore]
[1742510.174407]  [] __device_release_driver+0x71/0x86
[1742510.174412]  [] device_release_driver+0x2f/0x45
[1742510.174416]  [] bus_remove_device+0x5e/0x6c
[1742510.174420]  [] device_del+0x10b/0x161
[1742510.174425]  [] usb_disable_device+0x5f/0xbc [usbcore]
[1742510.174443]  [] usb_disconnect+0x94/0x134 [usbcore]
[1742510.174460]  [] hub_thread+0x38c/0xa8c [usbcore]
[1742510.174477]  [] kthread+0xa3/0xcf
[1742510.174481]  [] kernel_thread_helper+0x7/0x10
[1742510.174486]  ===
[1742510.174488] Code: 89 10 8d 43 04 c7 43 f8 00 01 10 00 c7 41 04 00 02 20 00 
e8 b3 79 d2 ff c7 43 f4 00 00 00 00 5b c9 c3 55 89 e5 56 89 c6 53 8b 00 <8b> 58 
0c 89 e0 25 00 e0 ff ff ff 40 14 89 f0 e8 9d ff ff ff 85 
[1742510.174517] EIP: [] klist_del+0x9/0x49 SS:ESP 0068:f5e59e28
[1742510.174524]  <5>scsi 6:0:0:0: Attached scsi generic sg0 type 3

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the F

[patch] aio: fix buggy put_ioctx call in aio_complete - v2

2006-12-29 Thread Chen, Kenneth W
An AIO bug was reported that sleeping function is being called in softirq
context:

BUG: warning at kernel/mutex.c:132/__mutex_lock_common()
Call Trace:
 [] __mutex_lock_slowpath+0x640/0x6c0
 [] mutex_lock+0x20/0x40
 [] flush_workqueue+0xb0/0x1a0
 [] __put_ioctx+0xc0/0x240
 [] aio_complete+0x2f0/0x420
 [] finished_one_bio+0x200/0x2a0
 [] dio_bio_complete+0x1c0/0x200
 [] dio_bio_end_aio+0x60/0x80
 [] bio_endio+0x110/0x1c0
 [] __end_that_request_first+0x180/0xba0
 [] end_that_request_chunk+0x30/0x60
 [] scsi_end_request+0x50/0x300 [scsi_mod]
 [] scsi_io_completion+0x200/0x8a0 [scsi_mod]
 [] sd_rw_intr+0x330/0x860 [sd_mod]
 [] scsi_finish_command+0x100/0x1c0 [scsi_mod]
 [] scsi_softirq_done+0x230/0x300 [scsi_mod]
 [] blk_done_softirq+0x160/0x1c0
 [] __do_softirq+0x200/0x240
 [] do_softirq+0x70/0xc0

See report: http://marc.theaimsgroup.com/?l=linux-kernel&m=116599593200888&w=2

flush_workqueue() is not allowed to be called in the softirq context. 
However, aio_complete() called from I/O interrupt can potentially call
put_ioctx with last ref count on ioctx and triggers bug.  It is simply
incorrect to perform ioctx freeing from aio_complete.

The bug is trigger-able from a race between io_destroy() and aio_complete().
A possible scenario:

cpu0   cpu1
io_destroy aio_complete
  wait_for_all_aios {__aio_put_req
 ... ctx->reqs_active--;
 if (!ctx->reqs_active)
return;
  }
  ...
  put_ioctx(ioctx)

 put_ioctx(ctx);
__put_ioctx
  bam! Bug trigger!

The real problem is that the condition check of ctx->reqs_active in
wait_for_all_aios() is incorrect that access to reqs_active is not
being properly protected by spin lock.

This patch adds that protective spin lock, and at the same time removes
all duplicate ref counting for each kiocb as reqs_active is already used
as a ref count for each active ioctx.  This also ensures that buggy call
to flush_workqueue() in softirq context is eliminated.


Signed-off-by: Ken Chen <[EMAIL PROTECTED]>

--- ./fs/aio.c.orig 2006-12-21 08:08:14.0 -0800
+++ ./fs/aio.c  2006-12-21 08:14:27.0 -0800
@@ -298,17 +298,23 @@ static void wait_for_all_aios(struct kio
struct task_struct *tsk = current;
DECLARE_WAITQUEUE(wait, tsk);
 
+   spin_lock_irq(&ctx->ctx_lock);
if (!ctx->reqs_active)
-   return;
+   goto out;
 
add_wait_queue(&ctx->wait, &wait);
set_task_state(tsk, TASK_UNINTERRUPTIBLE);
while (ctx->reqs_active) {
+   spin_unlock_irq(&ctx->ctx_lock);
schedule();
set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+   spin_lock_irq(&ctx->ctx_lock);
}
__set_task_state(tsk, TASK_RUNNING);
remove_wait_queue(&ctx->wait, &wait);
+
+out:
+   spin_unlock_irq(&ctx->ctx_lock);
 }
 
 /* wait_on_sync_kiocb:
@@ -424,7 +430,6 @@ static struct kiocb fastcall *__aio_get_
ring = kmap_atomic(ctx->ring_info.ring_pages[0], KM_USER0);
if (ctx->reqs_active < aio_ring_avail(&ctx->ring_info, ring)) {
list_add(&req->ki_list, &ctx->active_reqs);
-   get_ioctx(ctx);
ctx->reqs_active++;
okay = 1;
}
@@ -536,8 +541,6 @@ int fastcall aio_put_req(struct kiocb *r
spin_lock_irq(&ctx->ctx_lock);
ret = __aio_put_req(ctx, req);
spin_unlock_irq(&ctx->ctx_lock);
-   if (ret)
-   put_ioctx(ctx);
return ret;
 }
 
@@ -782,8 +785,7 @@ static int __aio_run_iocbs(struct kioctx
 */
iocb->ki_users++;   /* grab extra reference */
aio_run_iocb(iocb);
-   if (__aio_put_req(ctx, iocb))  /* drop extra ref */
-   put_ioctx(ctx);
+   __aio_put_req(ctx, iocb);
}
if (!list_empty(&ctx->run_list))
return 1;
@@ -998,14 +1000,10 @@ put_rq:
/* everything turned out well, dispose of the aiocb. */
ret = __aio_put_req(ctx, iocb);
 
-   spin_unlock_irqrestore(&ctx->ctx_lock, flags);
-
if (waitqueue_active(&ctx->wait))
wake_up(&ctx->wait);
 
-   if (ret)
-   put_ioctx(ctx);
-
+   spin_unlock_irqrestore(&ctx->ctx_lock, flags);
return ret;
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc2: known unfixed regressions

2006-12-29 Thread Horst H. von Brand
Adrian Bunk <[EMAIL PROTECTED]> wrote:

[...]

> Subject: BUG at fs/buffer.c:1235 when using gdb
> References : http://lkml.org/lkml/2006/12/17/134
> Submitter  : Andrew J. Barr <[EMAIL PROTECTED]>
> Fixed-By   : Jeremy Fitzhardinge <[EMAIL PROTECTED]>
> Commit : 8701ea957dd2a7c309e17c8dcde3a64b92d8aec0
> Status : fixed in -rc2

This I see in Fedora rawhide i686 2.6.19-1.2891.fc7 (BZ'd at

-- 
Dr. Horst H. von Brand   User #22616 counter.li.org
Departamento de InformaticaFono: +56 32 2654431
Universidad Tecnica Federico Santa Maria +56 32 2654239
Casilla 110-V, Valparaiso, Chile   Fax:  +56 32 2797513
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.20-rc1 1/6] GPIO core

2006-12-29 Thread David Brownell
On Thursday 28 December 2006 4:27 pm, Pavel Machek wrote:
> 
> > > > +GPIOs are identified by unsigned integers in the range 0..MAX_INT
> > > 
> > > Perhaps these should not be integers, then?
> > 
> > Thing is, the platforms **DO** identify them as integers.
> > ...
> 
> Well. when you see (something) = gpio_number + 5 ... you most likely
> have an error.

One could surely apply that argument to hundreds of places throughout
the kernel ... that doesn't make it a good one.  One of the downfalls
of many "object oriented programming" efforts was this same desire to
encapsulate things that don't need it; it's lose, not a don't-care.

Think of it as "cookies represented by integers" if you like.


> No, that's a wrong way. I want you to admit that gpio numbers are
> opaque cookies noone should look at, and use (something like)
> gpio_t... so that we can teach sparse to check them.

You're welcome to dream on.  :)

The goal here is not to create new complexity, it's to wrap the
current widely used abstraction (gpios are integers, with get/set
primitives and a direction) in a neutral programming interface
that's very easy to map to/from the current arch-specific ones.

So that various drivers can get on with the business of being
generally useful, rather than arch-specific; or at least being
easier to read.  See the example PXA patch I recently posted;
it's a code shrink, and *direct* translation from the current
GPIO interface (which uses integers).


> > > > +The get/set calls have no error returns because "invalid GPIO" should 
> > > > have
> > > > +been reported earlier in gpio_set_direction().  However, note that not 
> > > > all
> > > > +platforms can read the value of output pins; those that can't should 
> > > > always
> > > > +return zero.  Also, these calls will be ignored for GPIOs that can't 
> > > > safely
> > > > +be accessed without sleeping (see below).
> > > 
> > > 'Silently ignored' is ugly. BUG() would be okay there.
> > 
> > The reason for "silently ignored" is that we really don't want to be
> > cluttering up the code (source or object) with logic to test for this
> > kind of "can't happen" failure, especially since there's not going to
> > be any way to _resolve_ such failures cleanly.
> 
> You may not want to clutter up code for one arch, but for some of them
> maybe it is okay and welcome. Please do not document "silently
> ignored" into API.

Those words were yours; so you can consider that already done.
Should it instead say that's an (obviously unchecked) error?

The "no error returns" was an explicit request from several folk
during earlier API discussions.  People actually _using_ GPIOs have
no use for faults on those known-valid get/set calls.  Seriously,
exactly how could you ever recover from register access no longer
working correctly?  The chip is in the process of exploding, or
being crushed; what could software possibly do to recover?


> > And per Linus' rule about BUG(), "silently ignored" is clearly better
> > than needlessly stopping the whole system.
> 
> You are perverting what Linus said. "Do not bother detecting errors"
> is not what he had in mind.. but perhaps it should be WARN() not
> BUG().

You are perverting what _I_ said.  (As you've done before; stop that.)

It's very clear that I was talking about a tradeoff ("better than"), and
pointing out how Linus' rule made it clear that your proposal was on the
wrong end of things.  (His rule being that BUG should not be used unless
the system really can't continue operating.)

In terms of API specs, emitting any warning is traditionally out-of-scope.
Because "of course" it's legit for debug modes to do all kinds of things,
including emitting warnings; and likewise it's legit for non-debug modes
to do nothing not absolutely required.  And programming interface specs
have no business in "quality of implementation" issues like whether any
implementation even _has_ a debug mode, much less what it covers.


> > > > +... It is an unchecked error to use a GPIO
> > > > +number that hasn't been marked as an input using gpio_set_direction(), 
> > > > or
> > > 
> > > It should be valid to do irqs on outputs,
> > 
> > Good point -- it _might_ be valid to do that, on some platforms.
> > Such things have been used as a software IRQ trigger, which can
> > make bootstrapping some things easier.
> > 
> > That's not incompatible with it being an error for portable code to
> > try that, or with refusing to check it so that those platforms don't
> > needlessly cause trouble!
> 
> I believe your text suggests it _is_ incompatible. Plus that seems to
> mean that  architecture must not check for that error...

Which -- that portable code mustn't try such things?  That seems clearly
wrong; that's what the "is an error" phrase means.  Or that code should
not need an obscure API for nonportable tricks like that?  That seems
wrong too; that's one of the reasons to specify things as "unchecked".
Or that implementations shouldn't be required to

Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)

2006-12-29 Thread Andrew Morton
On Fri, 29 Dec 2006 16:58:41 -0800 (PST)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> 
> 
> On Fri, 29 Dec 2006, Andrew Morton wrote:
> >
> > > > Somewhat nastily, but as ext3 directories are metadata it is appropriate
> > > > that modifications to them be done in terms of buffer_heads (ie: 
> > > > blocks).
> > > 
> > > No. There is nothing "appropriate" about using buffer_heads for metadata. 
> > 
> > I said "modification".
> 
> You said "metadata".
> 
> Why do you think directories are any different from files? Yes, they are 
> metadata. So what? What does that have to do with anything?

We journal the contents of directories.  Fully.  So we handle their dirty
data at the block (ie: buffer_head) level.  When someone tries to dirty
part of a directory we need to cheat and not mark that part of the page as
dirty and we need to then write the block to the journal and then mark the
block as really dirty for checkpointing (but still attached to the journal)
and all that goop.

The regular page-based writeback doesn't apply until the block has been
written to the journal.  At that stage the block is considered dirty
against its real position on disk.  It will then be written back by pdflush
via the blockdev inode -> blkdev_writepage().  Unless kjournald needs to do
an early flush to reclaim the journal space, in which case kjournald will
write the block itself.

> 
> So I really don't understand why you make excuses for ext3 and talk about 
> "modifications" and "metadata". It was a fine design ten years ago. It's 
> not really very good any longer.
> 

As I said in another apparently-neglected email:

: We could possibly move ext3/4 directories out of the blockdev pagecache and
: into per-directory pagecache, but that wouldn't change anything - the
: journalling would still be block-based.

We already have all the code in place to journal blocks which are cached in
an address_space other than the blockdev inode's: ext3_journalled_aops.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.20-rc1 4/6] PXA GPIO wrappers

2006-12-29 Thread David Brownell
Just FYI -- I updated your patch, fixed a compile bug, and switched
some code over to use this new API.  The patch is appended.

I happen to think it's a lot easier to read this way.  Maybe to some
people it's easy to remember what a GPLR, GPCR, and GPSR register is
supposed to do, after a long time away from PXA or StrongARM platform
code; I'm not one of them.

And on a side note, yes it would make sense for someone to update the
GPIO IRQ support to properly manage PWER, so there's less need for
board-specific PM glue code.  :)

- Dave



This is an UNTESTED bunch of conversions of PXA code to use the new GPIO
interfaces, and other build/warning fixes.  It's not complete, or even
fully reviewed; but it builds.

Note that the idioms in the API are, as with other architectures, a very
direct match for the existing code ... and so the conversions are easy
to do and to review.

 arch/arm/mach-pxa/corgi.c   |   13 -
 arch/arm/mach-pxa/corgi_lcd.c   |8 ++--
 arch/arm/mach-pxa/corgi_pm.c|   25 ++---
 arch/arm/mach-pxa/corgi_ssp.c   |   18 ++
 arch/arm/mach-pxa/sharpsl.h |6 --
 arch/arm/mach-pxa/spitz_pm.c|6 +++---
 drivers/usb/gadget/pxa2xx_udc.c |   20 ++--
 drivers/usb/gadget/pxa2xx_udc.h |   21 ++---
 drivers/video/backlight/corgi_bl.c  |2 +-
 drivers/video/backlight/locomolcd.c |3 ++-
 10 files changed, 52 insertions(+), 70 deletions(-)

Index: pxa/arch/arm/mach-pxa/corgi.c
===
--- pxa.orig/arch/arm/mach-pxa/corgi.c  2006-12-10 01:30:42.0 -0800
+++ pxa/arch/arm/mach-pxa/corgi.c   2006-12-29 16:44:15.0 -0800
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -239,15 +240,12 @@ static void corgi_mci_setpower(struct de
 {
struct pxamci_platform_data* p_d = dev->platform_data;
 
-   if (( 1 << vdd) & p_d->ocr_mask)
-   GPSR1 = GPIO_bit(CORGI_GPIO_SD_PWR);
-   else
-   GPCR1 = GPIO_bit(CORGI_GPIO_SD_PWR);
+   gpio_set_value(CORGI_GPIO_SD_PWR, (1 << vdd) & p_d->ocr_mask);
 }
 
 static int corgi_mci_get_ro(struct device *dev)
 {
-   return GPLR(CORGI_GPIO_nSD_WP) & GPIO_bit(CORGI_GPIO_nSD_WP);
+   return gpio_get_value(CORGI_GPIO_nSD_WP);
 }
 
 static void corgi_mci_exit(struct device *dev, void *data)
@@ -269,10 +267,7 @@ static struct pxamci_platform_data corgi
  */
 static void corgi_irda_transceiver_mode(struct device *dev, int mode)
 {
-   if (mode & IR_OFF)
-   GPSR(CORGI_GPIO_IR_ON) = GPIO_bit(CORGI_GPIO_IR_ON);
-   else
-   GPCR(CORGI_GPIO_IR_ON) = GPIO_bit(CORGI_GPIO_IR_ON);
+   gpio_set_value(CORGI_GPIO_IR_ON, mode & IR_OFF);
 }
 
 static struct pxaficp_platform_data corgi_ficp_platform_data = {
Index: pxa/drivers/usb/gadget/pxa2xx_udc.h
===
--- pxa.orig/drivers/usb/gadget/pxa2xx_udc.h2006-12-10 01:31:53.0 
-0800
+++ pxa/drivers/usb/gadget/pxa2xx_udc.h 2006-12-29 16:32:41.0 -0800
@@ -139,6 +139,8 @@ struct pxa2xx_udc {
struct pxa2xx_epep [PXA_UDC_NUM_ENDPOINTS];
 };
 
+static struct pxa2xx_udc *the_controller;
+
 /*-*/
 
 #ifdef CONFIG_ARCH_LUBBOCK
@@ -175,25 +177,6 @@ struct pxa2xx_udc {
 
 /*-*/
 
-static struct pxa2xx_udc *the_controller;
-
-static inline int pxa_gpio_get(unsigned gpio)
-{
-   return (GPLR(gpio) & GPIO_bit(gpio)) != 0;
-}
-
-static inline void pxa_gpio_set(unsigned gpio, int is_on)
-{
-   int mask = GPIO_bit(gpio);
-
-   if (is_on)
-   GPSR(gpio) = mask;
-   else
-   GPCR(gpio) = mask;
-}
-
-/*-*/
-
 /*
  * Debugging support vanishes in non-debug builds.  DBG_NORMAL should be
  * mostly silent during normal use/testing, with no timing side-effects.
Index: pxa/arch/arm/mach-pxa/corgi_ssp.c
===
--- pxa.orig/arch/arm/mach-pxa/corgi_ssp.c  2006-12-10 01:30:42.0 
-0800
+++ pxa/arch/arm/mach-pxa/corgi_ssp.c   2006-12-29 16:16:18.0 -0800
@@ -16,6 +16,8 @@
 #include 
 #include 
 #include 
+
+#include 
 #include 
 #include 
 
@@ -52,13 +54,13 @@ unsigned long corgi_ssp_ads7846_putget(u
 
spin_lock_irqsave(&corgi_ssp_lock, flag);
if (ssp_machinfo->cs_ads7846 >= 0)
-   GPCR(ssp_machinfo->cs_ads7846) = 
GPIO_bit(ssp_machinfo->cs_ads7846);
+   gpio_set_value(ssp_machinfo->cs_ads7846, 0);
 
ssp_write_word(&corgi_ssp_dev,data);
ssp_read_word(&corgi_ssp_dev, &ret);
 
if (ssp_machinfo->cs_ads7846 >= 0)
-   GPSR(ssp_machin

Re: Finding hardlinks

2006-12-29 Thread Mikulas Patocka



On Fri, 29 Dec 2006, Trond Myklebust wrote:


On Thu, 2006-12-28 at 19:14 +0100, Mikulas Patocka wrote:

Why don't you rip off the support for colliding inode number from the
kernel at all (i.e. remove iget5_locked)?

It's reasonable to have either no support for colliding ino_t or full
support for that (including syscalls that userspace can use to work with
such filesystem) --- but I don't see any point in having half-way support
in kernel as is right now.


What would ino_t have to do with inode numbers? It is only used as a
hash table lookup. The inode number is set in the ->getattr() callback.


The question is: why does the kernel contain iget5 function that looks up 
according to callback, if the filesystem cannot have more than 64-bit 
inode identifier?


This lookup callback just induces writing bad filesystems with coliding 
inode numbers. Either remove coda, smb (and possibly other) filesystems 
from the kernel or make a proper support for userspace for them.


The situation is that current coreutils 6.7 fail to recursively copy 
directories if some two directories in the tree have coliding inode 
number, so you get random data corruption with these filesystems.


Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)

2006-12-29 Thread Linus Torvalds


On Fri, 29 Dec 2006, Andrew Morton wrote:
>
> > > Somewhat nastily, but as ext3 directories are metadata it is appropriate
> > > that modifications to them be done in terms of buffer_heads (ie: blocks).
> > 
> > No. There is nothing "appropriate" about using buffer_heads for metadata. 
> 
> I said "modification".

You said "metadata".

Why do you think directories are any different from files? Yes, they are 
metadata. So what? What does that have to do with anything?

They should still use virtual indexes, the way files do. That doesn't 
preclude them from using buffer-heads to mark their (partial-page) 
modifications and for keeping the virtual->physical translations cached.

I mean, really. Look at ext2. It does exactly that. It keeps the 
directories in the page cache - virtually indexed. And it even modifies 
them there. Exactly the same way it modifies regular file data.

It all works exactly the same way it works for regular files. It uses

page->mapping->a_ops->prepare_write(NULL, page, from, to);
... do modification ...
ext2_commit_chunk(page, from, to);

exactly the way regular file data works. 

That's why I'm saying there is absolutely _zero_ thing about "metadata" 
here, or even about "modifications". It all works better in a virtual 
cache, because you get all the support that we give to page caches.

So I really don't understand why you make excuses for ext3 and talk about 
"modifications" and "metadata". It was a fine design ten years ago. It's 
not really very good any longer.

I suspect we're stuck with the design, but that doesn't make it any 
_better_.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: open /dev/kvm: No such file or directory

2006-12-29 Thread H. Peter Anvin

Avi Kivity wrote:


Greg, /dev/kvm is a MISC_DYNAMIC_MINOR device.  Is there any way of 
using it without udev?  Should I allocate a static number?




Especially for something like /dev/kvm, I think it would make sense to 
allocate a static number for it.


-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)

2006-12-29 Thread Linus Torvalds


On Fri, 29 Dec 2006, Andrew Morton wrote:
> 
> Adam Richter spent considerable time a few years ago trying to make the
> mpage code go direct-to-BIO in all cases and we eventually gave up.  The
> conceptual layering of page<->blocks<->bio is pretty clean, and it is hard
> and ugly to fully optimise away the "block" bit in the middle.

Using the buffer cache as a translation layer to the physical address is 
fine. That's what _any_ block device will do.

I'm not at all sayign that "buffer heads must go away". They work fine.

What I'm saying is that

 - if you index by buffer heads, you're screwed.
 - if you do IO by starting at buffer heads, you're screwed.

Both indexing and writeback decisions should be done at the page cache 
layer. Then, when you actually need to do IO, you look at the buffers. But 
you start from the "page". YOU SHOULD NEVER LOOK UP a buffer on its own 
merits, and YOU SHOULD NEVER DO IO on a buffer head on its own cognizance.

So by all means keep the buffer heads as a way to keep the 
"virtual->physical" translation. It's what they were designed for. But 
they were _originally_ also designed for "lookup" and "driving the start 
of IO", and that is wrong, and has been wrong for a long time now, because

 - lookup based on physical address is fundamentally slow and inefficient. 
   You have to look up the virtual->physical translation somewhere else, 
   so it's by design an unnecessary indirection _and_ that "somewere 
   else" is also by definition filesystem-specific, so you can't do any 
   of these things at the VFS layer.

   Ergo: anything that needs to look up the physical address in order to 
   find the buffer head is BROKEN in this day and age. We look up the 
   _virtual_ page cache page, and then we can trivially find the buffer 
   heads within that page thanks to page->buffers.

   Example: ext2 vs ext3 readdir. One of them sucks, the other doesn't. 

 - starting IO based on the physical entity is insane. It's insane exactly 
   _because_ the VM doesn't actually think in physical addresses, or in 
   buffer-sized blocks. The VM only really knows about whole pages, and 
   all the VM decisions fundamentally have to be page-based. We don't ever 
   "free a buffer". We free a whole page, and as such, doing writeback 
   based on buffers is pointless, because it doesn't actually say anything 
   about the "page state" which is what the VM tracks.

But neither of these means that "buffer_head" itself has to go away. They 
both really boil down to the same thing: you should never KEY things by 
the buffer head. All actions should be based on virtual indexes as far as 
at all humanly possible.

Once you do lookup and locking and writeback _starting_ from the page, 
it's then easy to look up the actual buffer head within the page, and use 
that as a way to do the actual _IO_ on the physical address. So the buffer 
heads still exist in ext2, for example, but they don't drive the show 
quite as much.

(They still do in some areas: the allocation bitmaps, the xattr code etc. 
But as long as none of those have big VM footprints, and as long as no 
_common_ operations really care deeply, and as long as those data 
structures never need to be touched by the VM or VFS layer, nobody will 
ever really care).

The directory case comes up just because "readdir()" actually is very 
common, and sometimes very slow. And it can have a big VM working set 
footprint ("find"), so trying to be page-based actually really helps, 
because it all drives things like writeback on the _right_ issues, and we 
can do things like LRU's and writeback decisions on the level that really 
matters.

I actually suspect that the inode tables could benefit from being in the 
page cache too (although I think that the inode buffer address is actually 
"physical", so there's no indirection for inode tables, which means that 
the virtual vs physical addressing doesn't matter). For directories, there 
definitely is a big cost to continually doing the virtual->physical 
translation all the time.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG KERNEL 2.6.20-rc1] ftp: get or put stops during file-transfer

2006-12-29 Thread Komuro
Hi,

I investigated the ftp-file-transfer-stop problem by git-bisect method,
and found this problem was introduced by
"[TCP]: MD5 Signature Option (RFC2385) support" patch.

Mr.YOSHIFUJI san, please fix this problem.

>commit cfb6eeb4c860592edd123fdea908d23c6ad1c7dc
>Author: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
>Date:   Tue Nov 14 19:07:45 2006 -0800
>
>[TCP]: MD5 Signature Option (RFC2385) support.
>
>Based on implementation by Rick Payne.
>
>Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
>Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

Best Regards
Komuro


> On Sun, Dec 17, 2006 at 11:23:11PM +0900, Komuro wrote:
> > On Sun, 17 Dec 2006 04:02:22 +
> > Al Viro <[EMAIL PROTECTED]> wrote:
> > 
> > > On Sun, Dec 17, 2006 at 09:27:52PM +0900, Komuro wrote:
> > > > 
> > > > Hello,
> > > > 
> > > > On kernel 2.6.20-rc1, ftp (get or put) stops
> > > > during file-transfer.
> > > > 
> > > > Client: ftp-0.17-33.fc6  (192.168.1.1)
> > > > Server: vsftpd-2.0.5-8   (192.168.1.3)
> > > > 
> > > > This problem does _not_ happen on kernel-2.6.19.
> > > > is it caused by network-subsystem change on 2.6.20-rc1??
> > > 
> > > Do you have NAT between you and server?
> > 
> > No. I don't have NAT between the client and the server.
> > Actually, the client and the sever is located in same room.
> > 
> > client -- 100MbpsHub -- server.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OpenAFS gatekeepers request addition of AFS_SUPER_MAGIC to magic.h

2006-12-29 Thread Stephen Frost
* Adam Megacz ([EMAIL PROTECTED]) wrote:
> --- include/linux/magic.h   2006-12-29 15:48:50.0 -0800
> +++ include/linux/magic.h   2006-11-29 13:57:37.0 -0800
> @@ -3,7 +3,6 @@
>  
>  #define ADFS_SUPER_MAGIC   0xadf5
>  #define AFFS_SUPER_MAGIC   0xadff
> -#define AFS_SUPER_MAGIC0x5346414F
>  #define AUTOFS_SUPER_MAGIC 0x0187
>  #define CODA_SUPER_MAGIC   0x73757245
>  #define EFS_SUPER_MAGIC0x414A53

Wouldn't you want a patch which *adds* it, rather than one which
*removes* it...?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)

2006-12-29 Thread Andrew Morton
On Fri, 29 Dec 2006 16:11:44 -0800 (PST)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> 
> 
> > JBD implements physical block-based journalling, so it is 100% appropriate
> > that JBD deal with these disk blocks using their buffer_head
> > representation.
> 
> And as long as it does that, you just have to face the fact that it's 
> going to perform like crap, including what you call "extra" writes, and 
> what I call "deal with it".

It is quite tiresome to delete things which your interlocutor said and to
then restate them as if it were some sort of relevation.

> > Somewhat nastily, but as ext3 directories are metadata it is appropriate
> > that modifications to them be done in terms of buffer_heads (ie: blocks).
> 
> No. There is nothing "appropriate" about using buffer_heads for metadata. 

I said "modification".

> [stuff about directory reads elided]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)

2006-12-29 Thread Linus Torvalds


On Fri, 29 Dec 2006, Andrew Morton wrote:
> 
> They're extra.  As in "can be optimised away".

Sure. Don't use buffer heads.

> The buffer_head is not an IO container.  It is the kernel's core
> representation of a disk block.

Please come back from the 90's.

The buffer heads are nothing but a mapping of where the hardware block is. 
If you use it for anything else, you're basically screwed.

> JBD implements physical block-based journalling, so it is 100% appropriate
> that JBD deal with these disk blocks using their buffer_head
> representation.

And as long as it does that, you just have to face the fact that it's 
going to perform like crap, including what you call "extra" writes, and 
what I call "deal with it".

Btw, you can make pages be physically indexed too, but they obviously
 (a) won't be coherent with any virtual mapping laid on top of it
 (b) will be _physical_, so any readahead etc will be based on physical 
 addresses too.

> I thought I fixed the performance problem?

No, you papered over it, for the reasonably common case where things were 
physically contiguous - exactly by using a physical page cache, so now it 
can do read-ahead based on that. Then, because the pages contain buffer 
heads, the directory accesses can look up buffers, and if it was all 
physically contiguous, it all works fine.

But if you actually want virtualluy indexed caching (and all _users_ want 
it), it really doesn't work.

> Somewhat nastily, but as ext3 directories are metadata it is appropriate
> that modifications to them be done in terms of buffer_heads (ie: blocks).

No. There is nothing "appropriate" about using buffer_heads for metadata. 

It's quite proper - and a hell of a lot more efficient - to use virtual 
page-caching for metadata too.

Look at the ext2 readdir() implementation, and compare it to the crapola 
horror that is ext3. Guess what? ext2 uses virtually indexed metadata, and 
as a result it is both simpler, smaller and a LOT faster than ext3 in 
accessing that metadata.

Face it, Andrew, you're wrong on this one. Really. Just take a look at 
ext2_readdir(). 

[ I'm not saying that ext2_readdir() is _beautiful_. If it had been 
  written with the page cache in mind, it would probably have been done 
  very differently. And it doesn't do any readahead, probably because 
  nobody cared enough, but it should be trivial to add, and it would 
  automatically "do the right thing" just because it's much easier at the 
  page cache level.

  But I _am_ saying that compared to ext3, the ext2 readdir is a work of 
  art. ]

"metadata" has _zero_ to do with "physically indexed". There is no 
correlation what-so-ever. If you think there is a correlation, it's all in 
your mind.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)

2006-12-29 Thread Andrew Morton
On Fri, 29 Dec 2006 18:32:07 -0500
Theodore Tso <[EMAIL PROTECTED]> wrote:

> On Fri, Dec 29, 2006 at 02:42:51PM -0800, Linus Torvalds wrote:
> > I think ext3 is terminally crap by now. It still uses buffer heads in 
> > places where it really really shouldn't, and as a result, things like 
> > directory accesses are simply slower than they should be. Sadly, I don't 
> > think ext4 is going to fix any of this, either.
> 
> Not just ext3; ocfs2 is using the jbd layer as well.  I think we're
> going to have to put this (a rework of jbd2 to use the page cache) on
> the ext4 todo list, and work with the ocfs2 folks to try to come up
> with something that suits their needs as well.  Fortunately we have
> this filesystem/storage summit thing coming up in the next few months,
> and we can try to get some discussion going on the linux-ext4 mailing
> list in the meantime.  Unfortunately, I don't think this is going to
> be trivial.

I suspect it would be insane to move any part of JBD (apart from the
ordered-data flush) to use pagecache.  The whole thing is fundamentally
block-based.  But only for metadata - there's no strong reason why ext3/4
needs to manipulate file data via buffer_heads if data=journal and chattr
+j aren't in use.

We could possibly move ext3/4 directories out of the blockdev pagecache and
into per-directory pagecache, but that wouldn't change anything - the
journalling would still be block-based.

Adam Richter spent considerable time a few years ago trying to make the
mpage code go direct-to-BIO in all cases and we eventually gave up.  The
conceptual layering of page<->blocks<->bio is pretty clean, and it is hard
and ugly to fully optimise away the "block" bit in the middle.

buffer_heads become more important with large PAGE_CACHE_SIZE.  I'd expect
nobh mode to be quite inefficient with some workloads on 64k pages.  We
need that representation of the state (and location) of the block-sized
hunks which make up the page.

> If we do get this fixed for ext4, one interesting question is whether
> people would accept a patch to backport the fixes to ext3, given the
> the grief this is causing the page I/O and VM routines.  OTOH, reiser3
> probably has the same problems, and I suspect the changes to ext3 to
> cause it to avoid buffer heads, especially in order to support for
> filesystem blocksizes < pagesize, are going to be sufficiently risky
> in terms of introducing regressions to ext3 that they would probably
> be rejected on those grounds.  So unfortunately, we probably are going
> to have to support flushes via buffer heads for the foreseeable
> future.

We'll see.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


OpenAFS gatekeepers request addition of AFS_SUPER_MAGIC to magic.h

2006-12-29 Thread Adam Megacz

Hello,

Jeffrey Altman, one of the gatekeepers of OpenAFS (the open source
project which inherited the Transarc/IBM AFS codebase) has requested
that the magic number 0x5346414F (little endian 'OAFS') be allocated
for the f_type field of the fsinfo structure on Linux:

  https://lists.openafs.org/pipermail/openafs-info/2006-December/024829.html

I would like to offer the patch below for inclusion in the source
tree, if possible.  The patch adds it to include/linux/magic.h, mostly
as a way of publishing this number and ensuring that no other
filesystem accidentally uses it.

  - a


--- include/linux/magic.h   2006-12-29 15:48:50.0 -0800
+++ include/linux/magic.h   2006-11-29 13:57:37.0 -0800
@@ -3,7 +3,6 @@
 
 #define ADFS_SUPER_MAGIC   0xadf5
 #define AFFS_SUPER_MAGIC   0xadff
-#define AFS_SUPER_MAGIC0x5346414F
 #define AUTOFS_SUPER_MAGIC 0x0187
 #define CODA_SUPER_MAGIC   0x73757245
 #define EFS_SUPER_MAGIC0x414A53

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Want comments regarding patch

2006-12-29 Thread Bodo Eggert
Jan Engelhardt <[EMAIL PROTECTED]> wrote:
> On Dec 29 2006 07:57, Daniel Marjamäki wrote:

>> It was my goal to improve the readability. I failed.
>>
>> I personally prefer to use standard functions instead of writing code.
>> In my opinion using standard functions means less code that is easier to
>> read.
> 
> Hm in that case, what about having something like
> 
> void *memset_int(void *a, int x, int n) {
> asm("mov %0, %%esi;
>  mov %1, %%eax;
>  mov %2, %%ecx;
>  repz movsd;",
>a,x,n);
> }

This would copy the to-be-initialized buffer somewhere, if it compiles.

1) You want stosd, "store string", not "move string"
2) You'll want to set %%di (destination index) instead of %%si.
3) repz should be illegal for movs, it might be interpreted as rep by
   defective assemblers, since it generates the same prefix. "rep" is
   correct here, since you don't want to break on (non-)zero-words.
4) Mind the direction flag.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)

2006-12-29 Thread Linus Torvalds


On Fri, 29 Dec 2006, Theodore Tso wrote:
>
> If we do get this fixed for ext4, one interesting question is whether
> people would accept a patch to backport the fixes to ext3, given the
> the grief this is causing the page I/O and VM routines.

I don't think backporting is the smartest option (unless it's done _way_ 
later), but the real problem with it isn't actually the VM behaviour, but 
simply the fact that cached performance absoluely _sucks_ with the buffer 
cache.

With the physically indexed buffer cache thing, you end up always having 
to do these complicated translations into block numbers for every single 
access, and at some point when I benchmarked it, it was a huge overhead 
for doing simple things like readdir.

It's also a major pain for read-ahead, exactly partly due to the high cost 
of translation - because you can't cheaply check whether the next block is 
there, the cost of even asking the question "should I try to read ahead?" 
is much much higher. As a result, read-ahead is seriously limited, because 
it's so expensive for the cached case (which is still hopefully the 
_common_ case).

So because read-ahead is limited, the non-cached case then _really_ sucks.

It was somewhat fixed in a really god-awful fashion by having 
ext3_readdir() actually do _readahead_ though the page cache, even though 
it does everything else through the buffer cache. And that just happens to 
work because we hopefully have physically contiguous blocks, but when that 
isn't true, the readahead doesn't do squat.

It's really quite fundamentally broken. But none of that causes any 
problems for the VM, since directories cannot be mmap'ed anyway. But it's 
really pitiful, and it really doesn't work very well. Of course, other 
filesystems _also_ suck at this, and other operating systems haev even 
MORE problems, so people don't always seem to realize how horribly 
horribly broken this all is.

I really wish somebody would write a filesystem that did large cold-cache 
directories well. Open some horrible file manager on /usr/bin with cold 
caches, and weep. The biggest problem is the inode indirection, but at 
some point when I looked at why it sucked, it was doing basically 
synchronous single-buffer reads on the directory too, because readahead 
didn't work properly.

I was hoping that something like SpadFS would actually take off, because 
it seemed to do a lot of good design choices (having inodes in-line in the 
directory for when there are no hardlinks is probably a requirement for a 
good filesystem these days. The separate inode table had its uses, but 
indirection in a filesystem really does suck, and stat information is too 
important to be indirect unless it absolutely has to).

But I suspect it needs more than somebody who just wants to get his thesis 
written ;)

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/6] UML - Console locking fixes

2006-12-29 Thread Randy Dunlap
On Fri, 29 Dec 2006 18:41:27 -0500 Jeff Dike wrote:

>  arch/um/drivers/line.c  |  188 
> ++--
>  arch/um/drivers/stdio_console.c |6 -
>  arch/um/include/line.h  |   14 +-
>  3 files changed, 135 insertions(+), 73 deletions(-)
> 
> Index: linux-2.6.18-mm/arch/um/drivers/line.c
> ===
> --- linux-2.6.18-mm.orig/arch/um/drivers/line.c   2006-12-29 
> 15:12:45.0 -0500
> +++ linux-2.6.18-mm/arch/um/drivers/line.c2006-12-29 17:26:26.0 
> -0500
> @@ -421,42 +420,84 @@ int line_setup_irq(int fd, int input, in
>   return err;
>  }
>  
> +/* Normally, a driver like this can rely mostly on the tty layer

/*
 * Normally, ...

> + * locking, particularly when it comes to the driver structure.
> + * However, in this case, mconsole requests can come in "from the
> + * side", and race with opens and closes.
> + *
> + * The problem comes from line_setup not wanting to sleep if
> + * the device is open or being opened.  This can happen because the
> + * first opener of a device is responsible for setting it up on the
> + * host, and that can sleep.  The open of a port device will sleep
> + * until someone telnets to it.
> + *
> + * The obvious solution of putting everything under a mutex fails
> + * because then trying (and failing) to change the configuration of an
> + * open(ing) device will block until the open finishes.  The right
> + * thing to happen is for it to fail immediately.
> + *
> + * We can put the opening (and closing) of the host device under a
> + * separate lock, but that has to be taken before the count lock is
> + * released.  Otherwise, you open a window in which another open can
> + * come through and assume that the host side is opened and working.
> + *
> + * So, if the tty count is one, open will take the open mutex
> + * inside the count lock.  Otherwise, it just returns. This will sleep
> + * if the last close is pending, and will block a setup or get_config,
> + * but that should not last long.
> + *
> + * So, what we end up with is that open and close take the count lock.
> + * If the first open or last close are happening, then the open mutex
> + * is taken inside the count lock and the host opening or closing is done.
> + *
> + * setup and get_config only take the count lock.  setup modifies the
> + * device configuration only if the open count is zero.  Arbitrarily
> + * long blocking of setup doesn't happen because something would have to be
> + * waiting for an open to happen.  However, a second open with
> + * tty->count == 1 can't happen, and a close can't happen until the open
> + * had finished.
> + *
> + * We can't maintain our own count here because the tty layer doesn't
> + * match opens and closes.  It will call close if an open failed, and
> + * a tty hangup will result in excess closes.  So, we rely on
> + * tty->count instead.  It is one on both the first open and last close.
> + */
> +
>  int line_open(struct line *lines, struct tty_struct *tty)
>  {
> - struct line *line;
> + struct line *line = &lines[tty->index];
>   int err = -ENODEV;
>  
> - line = &lines[tty->index];
> - tty->driver_data = line;
> + spin_lock(&line->count_lock);
> + if(!line->valid)

if (
(several of these)

> + goto out_unlock;
> +
> + err = 0;
> + if(tty->count > 1)
> + goto out_unlock;
>  
...

> + if(!line->sigio){
> + chan_enable_winch(&line->chan_list, tty);
> + line->sigio = 1;
>   }
>  
> - err = 0;
>  }

> @@ -466,25 +507,38 @@ void line_close(struct tty_struct *tty, 
> + if(line == NULL)
> + return;

again.

>  
>   /* We ignore the error anyway! */
>   flush_buffer(line);
>  
> - if(tty->count == 1){
> - line->tty = NULL;
> - tty->driver_data = NULL;
> -
> - if(line->sigio){
> - unregister_winch(tty);
> - line->sigio = 0;
> - }
> + spin_lock(&line->count_lock);
> + if(!line->valid)
> + goto out_unlock;
> +
> + if(tty->count > 1)
> + goto out_unlock;
> +

ditto.  tritto.

> + mutex_lock(&line->open_mutex);
> + spin_unlock(&line->count_lock);
> +
> + line->tty = NULL;
> + tty->driver_data = NULL;
> +
> + if(line->sigio){

again.

> + unregister_winch(tty);
> + line->sigio = 0;
>  }
>  
> - spin_unlock_irq(&line->lock);
> + mutex_unlock(&line->open_mutex);
> + return;
> +
> +out_unlock:
> + spin_unlock(&line->count_lock);
>  }
>  
>  void close_lines(struct line *lines, int nlines)


---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.19-rt14 slowdown compared to 2.6.19

2006-12-29 Thread Chen, Tim C
Ingo Molnar wrote:
> 
> If you'd like to profile this yourself then the lowest-cost way of
> profiling lock contention on -rt is to use the yum kernel and run the
> attached trace-it-lock-prof.c code on the box while your workload is
> in 'steady state' (and is showing those extended idle times):
> 
>   ./trace-it-lock-prof > trace.txt
> 
> this captures up to 1 second worth of system activity, on the current
> CPU. Then you can construct the histogram via:
> 
>   grep -A 1 ' __schedule()<-' trace.txt | cut -d: -f2- | sort |
>   uniq -c | sort -n > prof.txt
> 

I did lock profiling on Volanomark as suggested and obtained the 
profile that is listed below. 

246
__sched_text_start()<-schedule()<-rt_spin_lock_slowlock()<-__lock_text_s
tart()
264  rt_mutex_slowunlock()<-rt_mutex_unlock()<-rt_up_read()<-(-1)()
334
__sched_text_start()<-schedule()<-posix_cpu_timers_thread()<-kthread()
437  __sched_text_start()<-schedule()<-do_futex()<-sys_futex()
467  (-1)()<-(0)()<-(0)()<-(0)()
495
__sched_text_start()<-preempt_schedule()<-__spin_unlock_irqrestore()<-rt
_mutex_adjust_prio()
497  __netif_rx_schedule()<-netif_rx()<-loopback_xmit()<-(-1)()
499
__sched_text_start()<-schedule()<-schedule_timeout()<-sk_wait_data()
500  tcp_recvmsg()<-sock_common_recvmsg()<-sock_recvmsg()<-(-1)()
503  __rt_down_read()<-rt_down_read()<-do_futex()<-(-1)()
   1160  __sched_text_start()<-schedule()<-ksoftirqd()<-kthread()
   1433  __rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)()
   1497  child_rip()<-(-1)()<-(0)()<-(0)()
   1936
__sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock()

Looks like the idle time I saw was due to lock contention 
during call to futex_wake, which requires acquisition of
current->mm->mmap_sem. 
Many of the java threads share mm and result in concurrent access to
common mm.  
Looks like under rt case there is no special treatment to read locking
so
the read lock accesses are contended under __rt_down_read.  For non rt
case, 
__down_read makes the distinction for read lock access and the read
lockings 
do not contend. 

Things are made worse here as this delayed waking up processes locked by
the futex.
See also a snippet of the latency_trace below. 

  -0 2D..2 5821us!: thread_return  (150 20)
  -0 2DN.1 6278us :
__sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)()
  -0 2DN.1 6278us : (0)()<-(0)()<-(0)()<-(0)()
java-6648  2D..2 6280us+: thread_return <-0> (20 -4)
java-6648  2D..1 6296us :
try_to_wake_up()<-wake_up_process()<-wakeup_next_waiter()<-rt_mutex_slow
unlock()
java-6648  2D..1 6296us :
rt_mutex_unlock()<-rt_up_read()<-do_futex()<-(-1)()
java-6648  2D..2 6297us : effective_prio <<...>-6673> (-4 -4)
java-6648  2D..2 6297us : __activate_task <<...>-6673> (-4 1)
java-6648  2 6297us < (-11)
java-6648  2 6298us+> sys_futex (00afaf50
0001 0001)
java-6648  2...1 6315us :
__sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock()
java-6648  2...1 6315us :
__rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)()
java-6648  2D..2 6316us+: deactivate_task  (-4 1)
  -0 2D..2 6318us+: thread_return  (-4 20)
  -0 2DN.1 6327us :
__sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)()
  -0 2DN.1 6328us+: (0)()<-(0)()<-(0)()<-(0)()
java-6629  2D..2 6330us+: thread_return <-0> (20 -4)
java-6629  2D..1 6347us :
try_to_wake_up()<-wake_up_process()<-wakeup_next_waiter()<-rt_mutex_slow
unlock()
java-6629  2D..1 6347us :
rt_mutex_unlock()<-rt_up_read()<-futex_wake()<-(-1)()
java-6629  2D..2 6348us : effective_prio  (-4 -4)
java-6629  2D..2 6349us : __activate_task  (-4 1)
java-6629  2 6350us+< (0)
java-6629  2 6352us+> sys_futex (00afc1dc
0001 0001)
java-6629  2...1 6368us :
__sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock()
java-6629  2...1 6368us :
__rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)()
java-6629  2D..2 6369us+: deactivate_task  (-4 1)
  -0 2D..2 6404us!: thread_return  (-4 20)
  -0 2DN.1 6584us :
__sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)()

Thanks.

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)

2006-12-29 Thread Andrew Morton
On Fri, 29 Dec 2006 14:42:51 -0800 (PST)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> 
> 
> On Fri, 29 Dec 2006, Andrew Morton wrote:
> > 
> > - The above change means that we do extra writeout.  If a page is dirtied
> >   once, kjournald will write it and then pdflush will come along and
> >   needlessly write it again.
> 
> There's zero extra writeout for any flushing that flushes BY PAGES.
> 
> Only broken flushers that flush by buffer heads (which really really 
> really shouldn't be done any more: welcome to the 21st century) will cause 
> extra writeouts. And those extra writeouts are obviously required for all 
> the dirty state to actually hit the disk - which is the point of the 
> patch.
> 
> So they're not "extra" - they are "required for correct working".

They're extra.  As in "can be optimised away".

> But I can't stress the fact enough that people SHOULD NOT do writeback by 
> buffer heads. The buffer head has been purely an "IO entity" for the last 
> several years now, and it's not a cache entity.

The buffer_head is not an IO container.  It is the kernel's core
representation of a disk block.  Usually (but not always) it is backed by
some memory which is in pagecache.  We can feed buffer_heads into IO
containers via submit_bh(), but that's far from the only thing we use
buffer_heads for.  We should have done s/buffer_head/block/g years ago.

JBD implements physical block-based journalling, so it is 100% appropriate
that JBD deal with these disk blocks using their buffer_head
representation.

That being said, ordered-data mode isn't really part of the JBD journalling
system at all (the data doesn't get journalled!) - ordered-mode is an
add-on to the JBD journal to make the metadata which we're about to journal
point at more-likely-to-be-correct data.

JBD's ordered-mode writeback is just a sync and I see no conceptual
problems with killing its old buffer_head based sync and moving it into the
21st century.

> Anybody who does writeback 
> by buffer heads is basically bypassing the real cache (the page cache), 
> and that's why all the problems happen.
> 
> I think ext3 is terminally crap by now. It still uses buffer heads in 
> places where it really really shouldn't,

The ordered-data mode flush: sure.  The rest of JBD's use of buffer_heads
is quite appropriate.

> and as a result, things like 
> directory accesses are simply slower than they should be. Sadly, I don't 
> think ext4 is going to fix any of this, either.

I thought I fixed the performance problem?

Somewhat nastily, but as ext3 directories are metadata it is appropriate
that modifications to them be done in terms of buffer_heads (ie: blocks).

> It's all just too inherently wrongly designed around the buffer head 
> (which was correct in 1995, but hasn't been correct for a long time in the 
> kernel any more).
> 
> > - Poor old IO accounting broke again.
> 
> No. That's why I used "set_page_dirty()" and did it that strange ugly way 
> ("set page dirty, even though it's already dirty, and even though the very 
> next thing we will do is TestClearPageDirty???").

nfs_set_page_dirty() and reiserfs_set_page_dirty() should now bail if
PageDirty() to avoid needless work.

> > - For a long time I've wanted to nuke the current ext3/jbd ordered-data
> >   implementation altogether, and just make kjournald call into the
> >   standard writeback code to do a standard suberblock->inodes->pages walk.
> 
> I really would like to see less of the buffer-head-based stuff, and yes, 
> more of the normal inode page walking. I don't think you can "order" 
> accesses within a page anyway, exactly because of memory mapping issues, 
> so any page ordering is not about buffer heads on the page itself, it 
> should be purely about metadata.

In this context ext3's "ordered" mode means "sync the file contents before
journalling the metadata which points at it".

> > - It's pretty obnoxious that the VM now sets a clean page "dirty" and
> >   then proceeds to modify its contents.  It would be nice to stop doing
> >   that.
> 
> No. I think this really the fundamental confusion people had. People 
> thought that setting the page dirty meant that it was no longer being 
> modified.

No.  Setting a page (or bh, or inode) dirty means "this is known to have
been modified".  ie: this cached entity is now out of sync with backing
store.

Ho hum.  I don't care much, really.  But then, I understand how all this
stuff works.  Try explaining to someone the relationship between
pte-dirtiness, page-dirtiness, radix-tree-dirtiness and
buffer_head-dirtiness.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/6] UML - Locking, whitespace, style, and hotplug fixes

2006-12-29 Thread Jeff Dike
This stuff is all post-2.6.20 material.

I'm fixing the locking through UML, and discovering other nastinesses along
the way.

Patches 1, 4, and 5 fix locking in the console and network drivers.

I have a huge raft of whitespace and style fixes which I will feed in as I
fix the locking in the same files.  The first installment of these are patches
3 and 6.

Patch 2 fixes problems getting hotplug errors back to the host, which I 
discovered while trolling through that code.

Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/6] UML - Network driver whitespace and style fixes

2006-12-29 Thread Jeff Dike
Some whitespace and coding style cleanups in the network driver code.

Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>

--
 arch/um/drivers/net_kern.c |   29 +++--
 arch/um/include/net_kern.h |6 +++---
 2 files changed, 18 insertions(+), 17 deletions(-)

Index: linux-2.6.18-mm/arch/um/drivers/net_kern.c
===
--- linux-2.6.18-mm.orig/arch/um/drivers/net_kern.c 2006-12-29 
17:42:37.0 -0500
+++ linux-2.6.18-mm/arch/um/drivers/net_kern.c  2006-12-29 18:18:48.0 
-0500
@@ -108,7 +108,7 @@ irqreturn_t uml_net_interrupt(int irq, v
 
 out:
spin_unlock(&lp->lock);
-   return(IRQ_HANDLED);
+   return IRQ_HANDLED;
 }
 
 static int uml_net_open(struct net_device *dev)
@@ -239,7 +239,7 @@ static int uml_net_set_mac(struct net_de
set_ether_mac(dev, hwaddr->sa_data);
spin_unlock_irq(&lp->lock);
 
-   return(0);
+   return 0;
 }
 
 static int uml_net_change_mtu(struct net_device *dev, int new_mtu)
@@ -460,7 +460,7 @@ static struct uml_net *find_device(int n
device = NULL;
  out:
spin_unlock(&devices_lock);
-   return(device);
+   return device;
 }
 
 static int eth_parse(char *str, int *index_out, char **str_out,
@@ -511,23 +511,23 @@ static int check_transport(struct transp
 
len = strlen(transport->name);
if(strncmp(eth, transport->name, len))
-   return(0);
+   return 0;
 
eth += len;
if(*eth == ',')
eth++;
else if(*eth != '\0')
-   return(0);
+   return 0;
 
*init_out = kmalloc(transport->setup_size, GFP_KERNEL);
if(*init_out == NULL)
-   return(1);
+   return 1;
 
if(!transport->setup(eth, mac_out, *init_out)){
kfree(*init_out);
*init_out = NULL;
}
-   return(1);
+   return 1;
 }
 
 void register_transport(struct transport *new)
@@ -572,9 +572,9 @@ static int eth_setup_common(char *str, i
eth_configure(index, init, mac, transport);
kfree(init);
}
-   return(1);
+   return 1;
}
-   return(0);
+   return 0;
 }
 
 static int eth_setup(char *str)
@@ -678,7 +678,7 @@ static int net_remove(int n, char **erro
dev = device->dev;
lp = dev->priv;
if(lp->fd > 0)
-return -EBUSY;
+   return -EBUSY;
if(lp->remove != NULL) (*lp->remove)(&lp->user);
unregister_netdev(dev);
platform_device_unregister(&device->pdev);
@@ -693,7 +693,7 @@ static struct mc_device net_mc = {
.name   = "eth",
.config = net_config,
.get_config = NULL,
-.id= net_id,
+   .id = net_id,
.remove = net_remove,
 };
 
@@ -706,7 +706,8 @@ static int uml_inetaddr_event(struct not
void (*proc)(unsigned char *, unsigned char *, void *);
unsigned char addr_buf[4], netmask_buf[4];
 
-   if(dev->open != uml_net_open) return(NOTIFY_DONE);
+   if(dev->open != uml_net_open)
+   return NOTIFY_DONE;
 
lp = dev->priv;
 
@@ -724,7 +725,7 @@ static int uml_inetaddr_event(struct not
memcpy(netmask_buf, &ifa->ifa_mask, sizeof(netmask_buf));
(*proc)(addr_buf, netmask_buf, &lp->user);
}
-   return(NOTIFY_DONE);
+   return NOTIFY_DONE;
 }
 
 struct notifier_block uml_inetaddr_notifier = {
@@ -834,7 +835,7 @@ void *get_output_buffer(int *len_out)
ret = (void *) __get_free_pages(GFP_KERNEL, 0);
if(ret) *len_out = PAGE_SIZE;
else *len_out = 0;
-   return(ret);
+   return ret;
 }
 
 void free_output_buffer(void *buffer)
Index: linux-2.6.18-mm/arch/um/include/net_kern.h
===
--- linux-2.6.18-mm.orig/arch/um/include/net_kern.h 2006-12-29 
17:42:37.0 -0500
+++ linux-2.6.18-mm/arch/um/include/net_kern.h  2006-12-29 18:19:07.0 
-0500
@@ -1,4 +1,4 @@
-/* 
+/*
  * Copyright (C) 2002 Jeff Dike ([EMAIL PROTECTED])
  * Licensed under the GPL
  */
@@ -36,7 +36,7 @@ struct uml_net_private {
void (*remove)(void *);
int (*read)(int, struct sk_buff **skb, struct uml_net_private *);
int (*write)(int, struct sk_buff **skb, struct uml_net_private *);
-   
+
void (*add_address)(unsigned char *, unsigned char *, void *);
void (*delete_address)(unsigned char *, unsigned char *, void *);
int (*set_mtu)(int mtu, void *);
@@ -63,7 +63,7 @@ struct transport {
 extern struct net_device *ether_init(int);
 extern unsigned short ether_protocol(struct sk_buff *);
 extern struct sk_buff *ether_adjust_skb(struct sk_buff *skb, int extra);
-extern int tap_setup_common(char *str, char *type, char **dev_name, 
+extern int tap_setup_com

[PATCH 5/6] UML - Add locking to network transport registration

2006-12-29 Thread Jeff Dike
The registration of host network transports needed some locking.  The
transport list itself is locked, but calls to the registration
routines are not.  This is compensated for by checking that a
transport structure is not yet on any list.

I also took the opportunity to const all fields in the transport
structure except the list, which obviously can be modified.

Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>

--
 arch/um/drivers/net_kern.c |9 +
 arch/um/include/net_kern.h |8 
 2 files changed, 9 insertions(+), 8 deletions(-)

Index: linux-2.6.18-mm/arch/um/drivers/net_kern.c
===
--- linux-2.6.18-mm.orig/arch/um/drivers/net_kern.c 2006-12-29 
12:58:59.0 -0500
+++ linux-2.6.18-mm/arch/um/drivers/net_kern.c  2006-12-29 12:59:19.0 
-0500
@@ -498,10 +498,8 @@ struct eth_init {
int index;
 };
 
-/* Filled in at boot time.  Will need locking if the transports become
- * modular.
- */
-struct list_head transports = LIST_HEAD_INIT(transports);
+static DEFINE_SPINLOCK(transports_lock);
+static LIST_HEAD(transports);
 
 /* Filled in during early boot */
 struct list_head eth_cmd_line = LIST_HEAD_INIT(eth_cmd_line);
@@ -540,7 +538,10 @@ void register_transport(struct transport
char *mac = NULL;
int match;
 
+   spin_lock(&transports_lock);
+   BUG_ON(!list_empty(&new->list));
list_add(&new->list, &transports);
+   spin_unlock(&transports_lock);
 
list_for_each_safe(ele, next, ð_cmd_line){
eth = list_entry(ele, struct eth_init, list);
Index: linux-2.6.18-mm/arch/um/include/net_kern.h
===
--- linux-2.6.18-mm.orig/arch/um/include/net_kern.h 2006-12-29 
12:23:02.0 -0500
+++ linux-2.6.18-mm/arch/um/include/net_kern.h  2006-12-29 12:59:19.0 
-0500
@@ -52,12 +52,12 @@ struct net_kern_info {
 
 struct transport {
struct list_head list;
-   char *name;
-   int (*setup)(char *, char **, void *);
+   const char *name;
+   int (* const setup)(char *, char **, void *);
const struct net_user_info *user;
const struct net_kern_info *kern;
-   int private_size;
-   int setup_size;
+   const int private_size;
+   const int setup_size;
 };
 
 extern struct net_device *ether_init(int);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/6] UML - Console locking fixes

2006-12-29 Thread Jeff Dike
Clean up the console driver locking.  There are various problems here,
including sleeping under a spinlock and spinlock recursion, some of
which are fixed here.  This patch deals with the locking involved with
opens and closes.  The problem is that an mconsole request to change a
console's configuration can race with an open.  Changing a
configuration should only be done when a console isn't opened.  Also,
an open must be looking at a stable configuration.  In addition, a get
configuration request must observe the same locking since it must also
see a stable configuration.  With the old locking, it was possible for
this to hang indefinitely in some cases because open would block for a
long time waiting for a connection from the host while holding the
lock needed by the mconsole request.

As explained in the long comment, this is fixed by adding a spinlock
for the use count and configuration and a mutex for the actual open
and close.

Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>

--
 arch/um/drivers/line.c  |  188 ++--
 arch/um/drivers/stdio_console.c |6 -
 arch/um/include/line.h  |   14 +-
 3 files changed, 135 insertions(+), 73 deletions(-)

Index: linux-2.6.18-mm/arch/um/drivers/line.c
===
--- linux-2.6.18-mm.orig/arch/um/drivers/line.c 2006-12-29 15:12:45.0 
-0500
+++ linux-2.6.18-mm/arch/um/drivers/line.c  2006-12-29 17:26:26.0 
-0500
@@ -191,7 +191,6 @@ void line_flush_buffer(struct tty_struct
/*XXX: copied from line_write, verify if it is correct!*/
if(tty->stopped)
return;
-   //return 0;
 
spin_lock_irqsave(&line->lock, flags);
err = flush_buffer(line);
@@ -421,42 +420,84 @@ int line_setup_irq(int fd, int input, in
return err;
 }
 
+/* Normally, a driver like this can rely mostly on the tty layer
+ * locking, particularly when it comes to the driver structure.
+ * However, in this case, mconsole requests can come in "from the
+ * side", and race with opens and closes.
+ *
+ * The problem comes from line_setup not wanting to sleep if
+ * the device is open or being opened.  This can happen because the
+ * first opener of a device is responsible for setting it up on the
+ * host, and that can sleep.  The open of a port device will sleep
+ * until someone telnets to it.
+ *
+ * The obvious solution of putting everything under a mutex fails
+ * because then trying (and failing) to change the configuration of an
+ * open(ing) device will block until the open finishes.  The right
+ * thing to happen is for it to fail immediately.
+ *
+ * We can put the opening (and closing) of the host device under a
+ * separate lock, but that has to be taken before the count lock is
+ * released.  Otherwise, you open a window in which another open can
+ * come through and assume that the host side is opened and working.
+ *
+ * So, if the tty count is one, open will take the open mutex
+ * inside the count lock.  Otherwise, it just returns. This will sleep
+ * if the last close is pending, and will block a setup or get_config,
+ * but that should not last long.
+ *
+ * So, what we end up with is that open and close take the count lock.
+ * If the first open or last close are happening, then the open mutex
+ * is taken inside the count lock and the host opening or closing is done.
+ *
+ * setup and get_config only take the count lock.  setup modifies the
+ * device configuration only if the open count is zero.  Arbitrarily
+ * long blocking of setup doesn't happen because something would have to be
+ * waiting for an open to happen.  However, a second open with
+ * tty->count == 1 can't happen, and a close can't happen until the open
+ * had finished.
+ *
+ * We can't maintain our own count here because the tty layer doesn't
+ * match opens and closes.  It will call close if an open failed, and
+ * a tty hangup will result in excess closes.  So, we rely on
+ * tty->count instead.  It is one on both the first open and last close.
+ */
+
 int line_open(struct line *lines, struct tty_struct *tty)
 {
-   struct line *line;
+   struct line *line = &lines[tty->index];
int err = -ENODEV;
 
-   line = &lines[tty->index];
-   tty->driver_data = line;
+   spin_lock(&line->count_lock);
+   if(!line->valid)
+   goto out_unlock;
+
+   err = 0;
+   if(tty->count > 1)
+   goto out_unlock;
 
-   /* The IRQ which takes this lock is not yet enabled and won't be run
-* before the end, so we don't need to use spin_lock_irq.*/
-   spin_lock(&line->lock);
+   mutex_lock(&line->open_mutex);
+   spin_unlock(&line->count_lock);
 
tty->driver_data = line;
line->tty = tty;
-   if(!line->valid)
-   goto out;
 
-   if(tty->count == 1){
-   /* Here the device is opened, if necessary, and interrupt
- 

[PATCH 2/6] UML - Return hotplug errors to host

2006-12-29 Thread Jeff Dike
I noticed that errors happening while hotplugging devices from the
host were never returned back to the mconsole client.  In some cases,
success was returned instead of even an information-free error.

This patch cleans that up by having the low-level configuration code
pass back an error string along with an error code.  At the top level,
which knows whether it is early boot time or responding to an mconsole
request, the string is printk'd or returned to the mconsole client.

There are also whitespace and trivial code cleanups in the surrounding code.

Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>

--
 arch/um/drivers/chan_kern.c |   77 ++-
 arch/um/drivers/line.c  |   66 +++-
 arch/um/drivers/mconsole_kern.c |   40 
 arch/um/drivers/net_kern.c  |   97 +
 arch/um/drivers/ssl.c   |   24 +--
 arch/um/drivers/stdio_console.c |   22 --
 arch/um/drivers/ubd_kern.c  |  132 ++--
 arch/um/include/chan_kern.h |2 
 arch/um/include/line.h  |8 +-
 arch/um/include/mconsole_kern.h |   15 
 arch/um/kernel/tt/gdb.c |4 -
 arch/um/kernel/tt/gdb_kern.c|4 -
 12 files changed, 253 insertions(+), 238 deletions(-)

Index: linux-2.6.18-mm/arch/um/drivers/line.c
===
--- linux-2.6.18-mm.orig/arch/um/drivers/line.c 2006-12-29 17:26:26.0 
-0500
+++ linux-2.6.18-mm/arch/um/drivers/line.c  2006-12-29 17:26:54.0 
-0500
@@ -549,14 +549,16 @@ void close_lines(struct line *lines, int
close_chan(&lines[i].chan_list, 0);
 }
 
-static void setup_one_line(struct line *lines, int n, char *init, int 
init_prio)
+static int setup_one_line(struct line *lines, int n, char *init, int init_prio,
+ char **error_out)
 {
struct line *line = &lines[n];
+   int err = -EINVAL;
 
spin_lock(&line->count_lock);
 
if(line->tty != NULL){
-   printk("line_setup - device %d is open\n", n);
+   *error_out = "Device is already open";
goto out;
}
 
@@ -569,18 +571,22 @@ static void setup_one_line(struct line *
line->valid = 1;
}
}
+   err = 0;
 out:
spin_unlock(&line->count_lock);
+   return err;
 }
 
 /* Common setup code for both startup command line and mconsole initialization.
  * @lines contains the array (of size @num) to modify;
  * @init is the setup string;
+ * @error_out is an error string in the case of failure;
  */
 
-int line_setup(struct line *lines, unsigned int num, char *init)
+int line_setup(struct line *lines, unsigned int num, char *init,
+  char **error_out)
 {
-   int i, n;
+   int i, n, err;
char *end;
 
if(*init == '=') {
@@ -591,52 +597,56 @@ int line_setup(struct line *lines, unsig
else {
n = simple_strtoul(init, &end, 0);
if(*end != '='){
-   printk(KERN_ERR "line_setup failed to parse \"%s\"\n",
-  init);
-   return 0;
+   *error_out = "Couldn't parse device number";
+   return -EINVAL;
}
init = end;
}
init++;
 
if (n >= (signed int) num) {
-   printk("line_setup - %d out of range ((0 ... %d) allowed)\n",
-  n, num - 1);
-   return 0;
+   *error_out = "Device number out of range";
+   return -EINVAL;
+   }
+   else if (n >= 0){
+   err = setup_one_line(lines, n, init, INIT_ONE, error_out);
+   if(err)
+   return err;
}
-   else if (n >= 0)
-   setup_one_line(lines, n, init, INIT_ONE);
else {
-   for(i = 0; i < num; i++)
-   setup_one_line(lines, i, init, INIT_ALL);
+   for(i = 0; i < num; i++){
+   err = setup_one_line(lines, i, init, INIT_ALL,
+error_out);
+   if(err)
+   return err;
+   }
}
return n == -1 ? num : n;
 }
 
 int line_config(struct line *lines, unsigned int num, char *str,
-   const struct chan_opts *opts)
+   const struct chan_opts *opts, char **error_out)
 {
struct line *line;
char *new;
int n;
 
if(*str == '='){
-   printk("line_config - can't configure all devices from "
-  "mconsole\n");
-   return 1;
+   *error_out = "Can't configure all devices from mconsole";
+   return -EINVAL;
}
 
new = kstrdup(str, GFP_KERNEL);
if(new == NULL){
-   printk("lin

[PATCH 3/6] UML - Console whitespace and comment tidying

2006-12-29 Thread Jeff Dike
Some comment and whitespace cleanups in the console and mconsole code.

Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>

--
 arch/um/drivers/stdio_console.c |2 --
 arch/um/include/mconsole_kern.h |2 +-
 2 files changed, 1 insertion(+), 3 deletions(-)


Index: linux-2.6.18-mm/arch/um/drivers/stdio_console.c
===
--- linux-2.6.18-mm.orig/arch/um/drivers/stdio_console.c2006-12-29 
17:26:54.0 -0500
+++ linux-2.6.18-mm/arch/um/drivers/stdio_console.c 2006-12-29 
18:20:43.0 -0500
@@ -30,8 +30,6 @@
 
 #define MAX_TTYS (16)
 
-/* 
- */
-
 /* Referenced only by tty_driver below - presumably it's locked correctly
  * by the tty driver.
  */
Index: linux-2.6.18-mm/arch/um/include/mconsole_kern.h
===
--- linux-2.6.18-mm.orig/arch/um/include/mconsole_kern.h2006-12-29 
17:26:54.0 -0500
+++ linux-2.6.18-mm/arch/um/include/mconsole_kern.h 2006-12-29 
18:21:37.0 -0500
@@ -20,7 +20,7 @@ struct mc_device {
char *name;
int (*config)(char *, char **);
int (*get_config)(char *, char *, int, char **);
-int (*id)(char **, int *, int *);
+   int (*id)(char **, int *, int *);
int (*remove)(int, char **);
 };
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/6] UML - Lock the irqs_to_free list

2006-12-29 Thread Jeff Dike
Fix (i.e. add some) the locking around the irqs_to_free list.

Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>

--
 arch/um/drivers/chan_kern.c |   21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

Index: linux-2.6.18-mm/arch/um/drivers/chan_kern.c
===
--- linux-2.6.18-mm.orig/arch/um/drivers/chan_kern.c2006-12-29 
15:27:16.0 -0500
+++ linux-2.6.18-mm/arch/um/drivers/chan_kern.c 2006-12-29 16:38:43.0 
-0500
@@ -222,15 +222,28 @@ void enable_chan(struct line *line)
}
 }
 
+/* Items are added in IRQ context, when free_irq can't be called, and
+ * removed in process context, when it can.
+ * This handles interrupt sources which disappear, and which need to
+ * be permanently disabled.  This is discovered in IRQ context, but
+ * the freeing of the IRQ must be done later.
+ */
+static DEFINE_SPINLOCK(irqs_to_free_lock);
 static LIST_HEAD(irqs_to_free);
 
 void free_irqs(void)
 {
struct chan *chan;
+   LIST_HEAD(list);
+   struct list_head *ele;
+
+   spin_lock_irq(&irqs_to_free_lock);
+   list_splice_init(&irqs_to_free, &list);
+   INIT_LIST_HEAD(&irqs_to_free);
+   spin_unlock_irq(&irqs_to_free_lock);
 
-   while(!list_empty(&irqs_to_free)){
-   chan = list_entry(irqs_to_free.next, struct chan, free_list);
-   list_del(&chan->free_list);
+   list_for_each(ele, &list){
+   chan = list_entry(ele, struct chan, free_list);
 
if(chan->input)
free_irq(chan->line->driver->read_irq, chan);
@@ -246,7 +259,9 @@ static void close_one_chan(struct chan *
return;
 
if(delay_free_irq){
+   spin_lock_irq(&irqs_to_free_lock);
list_add(&chan->free_list, &irqs_to_free);
+   spin_unlock_irq(&irqs_to_free_lock);
}
else {
if(chan->input)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)

2006-12-29 Thread Theodore Tso
On Fri, Dec 29, 2006 at 02:42:51PM -0800, Linus Torvalds wrote:
> I think ext3 is terminally crap by now. It still uses buffer heads in 
> places where it really really shouldn't, and as a result, things like 
> directory accesses are simply slower than they should be. Sadly, I don't 
> think ext4 is going to fix any of this, either.

Not just ext3; ocfs2 is using the jbd layer as well.  I think we're
going to have to put this (a rework of jbd2 to use the page cache) on
the ext4 todo list, and work with the ocfs2 folks to try to come up
with something that suits their needs as well.  Fortunately we have
this filesystem/storage summit thing coming up in the next few months,
and we can try to get some discussion going on the linux-ext4 mailing
list in the meantime.  Unfortunately, I don't think this is going to
be trivial.

If we do get this fixed for ext4, one interesting question is whether
people would accept a patch to backport the fixes to ext3, given the
the grief this is causing the page I/O and VM routines.  OTOH, reiser3
probably has the same problems, and I suspect the changes to ext3 to
cause it to avoid buffer heads, especially in order to support for
filesystem blocksizes < pagesize, are going to be sufficiently risky
in terms of introducing regressions to ext3 that they would probably
be rejected on those grounds.  So unfortunately, we probably are going
to have to support flushes via buffer heads for the foreseeable
future.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]

2006-12-29 Thread hui
On Tue, Dec 26, 2006 at 04:51:21PM -0800, Chen, Tim C wrote:
> Ingo Molnar wrote:
> > If you'd like to profile this yourself then the lowest-cost way of
> > profiling lock contention on -rt is to use the yum kernel and run the
> > attached trace-it-lock-prof.c code on the box while your workload is
> > in 'steady state' (and is showing those extended idle times):
> > 
> >   ./trace-it-lock-prof > trace.txt
>
> Thanks for the pointer.  Will let you know of any relevant traces.

Tim,

http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.lock_stat.patch

You can also apply this patch to get more precise statistics down to
the lock. For example:

...

[50, 30, 279 :: 1, 0]   {tty_ldisc_try, -, 0}
[5, 5, 0 :: 19, 0]  {alloc_super, fs/super.c, 76}
[5, 5, 3 :: 1, 0]   {__free_pages_ok, -, 0}
[5728, 862, 156 :: 2, 0]{journal_init_common, 
fs/jbd/journal.c, 667}
[594713, 79020, 4287 :: 60818, 0]   {inode_init_once, 
fs/inode.c, 193}
[602, 0, 0 :: 1, 0] {lru_cache_add_active, -, 0}
[63, 5, 59 :: 1, 0] {lookup_mnt, -, 0}
[6425, 378, 103 :: 24, 0]   {initialize_tty_struct, 
drivers/char/tty_io.c, 3530}
[6708, 1, 225 :: 1, 0]  {file_move, -, 0}
[67, 8, 15 :: 1, 0] {do_lookup, -, 0}
[69, 0, 0 :: 1, 0]  {exit_mmap, -, 0}
[7, 0, 0 :: 1, 0]   {uart_set_options, 
drivers/serial/serial_core.c, 1876}
[76, 0, 0 :: 1, 0]  {get_zone_pcp, -, 0}
[, 5, 9 :: 1, 0]{as_work_handler, -, 0}
[8689, 0, 0 :: 15, 0]   {create_workqueue_thread, 
kernel/workqueue.c, 474}
[89, 7, 6 :: 195, 0]{sighand_ctor, kernel/fork.c, 1474}
@contention events = 1791177
@found = 21

Is the output from /proc/lock_stat/contention. First column is the number
of contention that will results in a full block of the task, second is the
number of times the mutex owner is active on a per cpu run queue the
scheduler and third is the number of times Steve Rostedt's ownership handoff
code averted a full block. Peter Zijlstra used it initially during his
files_lock work.

Overhead of the patch is very low since it is only recording stuff in the
slow path of the rt-mutex implementation.

Writing to that file clears all of the stats for a fresh run with a
benchmark. This should give a precise point at which any contention would
happen in -rt. In general, -rt should do about as well as the stock kernel
minus the overhead of interrupt threads.

Since the last release, I've added checks for whether the task is running
as "current" on a run queue to see if adaptive spins would be useful in -rt.

These new stats show that only a small percentage of events would benefit
from the use of adaptive spins in front of a rt- mutex. Any implementation
of it would have little impact on the system. It's not the mechanism but
the raw MP work itself that contributes to the good MP performance of Linux.

Apply and have fun.

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[git patches] ocfs2 fixes

2006-12-29 Thread Mark Fasheh
Hi Linus,
Here are some 2.6.20 fixes for ocfs2. The patch by Zhen Wei isn't
really a fix, but a very small amount of support for a feature which is
mostly implemented in ocfs2-tools. Considering it's just a single attribute
export via configfs, I'd say it's pretty safe to merge.

Please pull from 'upstream-linus' branch of
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2.git upstream-linus

to receive the following updates:

 fs/ocfs2/aops.c  |   24 +---
 fs/ocfs2/cluster/heartbeat.c |   17 +
 fs/ocfs2/dlmglue.c   |   10 +-
 fs/ocfs2/file.c  |   13 +++--
 4 files changed, 54 insertions(+), 10 deletions(-)

Mark Fasheh:
  ocfs2: don't print error in ocfs2_permission()
  ocfs2: Allow direct I/O read past end of file
  ocfs2: ignore NULL vfsmnt in ocfs2_should_update_atime()
  ocfs2: always unmap in ocfs2_data_convert_worker()

Zhen Wei:
  ocfs2: export heartbeat thread pid via configfs

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index ef6cd30..93628b0 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -540,8 +540,7 @@ static int ocfs2_direct_IO_get_blocks(st
 struct buffer_head *bh_result, int create)
 {
int ret;
-   u64 vbo_max; /* file offset, max_blocks from iblock */
-   u64 p_blkno;
+   u64 p_blkno, inode_blocks;
int contig_blocks;
unsigned char blocksize_bits = inode->i_sb->s_blocksize_bits;
unsigned long max_blocks = bh_result->b_size >> inode->i_blkbits;
@@ -550,12 +549,23 @@ static int ocfs2_direct_IO_get_blocks(st
 * nicely aligned and of the right size, so there's no need
 * for us to check any of that. */
 
-   vbo_max = ((u64)iblock + max_blocks) << blocksize_bits;
-
spin_lock(&OCFS2_I(inode)->ip_lock);
-   if ((iblock + max_blocks) >
-   ocfs2_clusters_to_blocks(inode->i_sb,
-OCFS2_I(inode)->ip_clusters)) {
+   inode_blocks = ocfs2_clusters_to_blocks(inode->i_sb,
+   OCFS2_I(inode)->ip_clusters);
+
+   /*
+* For a read which begins past the end of file, we return a hole.
+*/
+   if (!create && (iblock >= inode_blocks)) {
+   spin_unlock(&OCFS2_I(inode)->ip_lock);
+   ret = 0;
+   goto bail;
+   }
+
+   /*
+* Any write past EOF is not allowed because we'd be extending.
+*/
+   if (create && (iblock + max_blocks) > inode_blocks) {
spin_unlock(&OCFS2_I(inode)->ip_lock);
ret = -EIO;
goto bail;
diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index a25ef5a..277ca67 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -1447,6 +1447,15 @@ out:
return ret;
 }
 
+static ssize_t o2hb_region_pid_read(struct o2hb_region *reg,
+  char *page)
+{
+   if (!reg->hr_task)
+   return 0;
+
+   return sprintf(page, "%u\n", reg->hr_task->pid);
+}
+
 struct o2hb_region_attribute {
struct configfs_attribute attr;
ssize_t (*show)(struct o2hb_region *, char *);
@@ -1485,11 +1494,19 @@ static struct o2hb_region_attribute o2hb
.store  = o2hb_region_dev_write,
 };
 
+static struct o2hb_region_attribute o2hb_region_attr_pid = {
+   .attr   = { .ca_owner = THIS_MODULE,
+   .ca_name = "pid",
+   .ca_mode = S_IRUGO | S_IRUSR },
+   .show   = o2hb_region_pid_read,
+};
+
 static struct configfs_attribute *o2hb_region_attrs[] = {
&o2hb_region_attr_block_bytes.attr,
&o2hb_region_attr_start_block.attr,
&o2hb_region_attr_blocks.attr,
&o2hb_region_attr_dev.attr,
+   &o2hb_region_attr_pid.attr,
NULL,
 };
 
diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index e622013..e335541 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2718,6 +2718,15 @@ static int ocfs2_data_convert_worker(str
inode = ocfs2_lock_res_inode(lockres);
mapping = inode->i_mapping;
 
+   /*
+* We need this before the filemap_fdatawrite() so that it can
+* transfer the dirty bit from the PTE to the
+* page. Unfortunately this means that even for EX->PR
+* downconverts, we'll lose our mappings and have to build
+* them up again.
+*/
+   unmap_mapping_range(mapping, 0, 0, 0);
+
if (filemap_fdatawrite(mapping)) {
mlog(ML_ERROR, "Could not sync inode %llu for downconvert!",
 (unsigned long long)OCFS2_I(inode)->ip_blkno);
@@ -2725,7 +2734,6 @@ static int ocfs2_data_convert_worker(str
sync_mapping_buffers(mapping);
if (blocking == LKM_EXMODE) {
truncate_inode_pages(mapping, 0);
-   unmap_mapping_range(mapping, 0, 

Re: VM: Fix nasty and subtle race in shared mmap'ed page writeback

2006-12-29 Thread Andrea Gelmini
On Fri, Dec 29, 2006 at 06:59:02PM +, Linux Kernel Mailing List wrote:
> Gitweb: 
> http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7658cc289288b8ae7dd2c2224549a048431222b3
> Commit: 7658cc289288b8ae7dd2c2224549a048431222b3
> Parent: 3bf8ba38f38d3647368e4edcf7d019f9f8d9184a
> Author: Linus Torvalds <[EMAIL PROTECTED]>
> AuthorDate: Fri Dec 29 10:00:58 2006 -0800
> Committer:  Linus Torvalds <[EMAIL PROTECTED]>
> CommitDate: Fri Dec 29 10:00:58 2006 -0800
> 
> VM: Fix nasty and subtle race in shared mmap'ed page writeback

With 2.6.20-rc2-git1, which contain this patch, I have no more Berkeley
DB corruption with Klibido.?
I'm afraid a lot of software project switched to Sqlite,? from BDB,?
because the bug this patch fix (ie. http://bogofilter.sourceforge.net/).
I've also thought, since years, it was an userland problem.

Ciao,
gelma

---
? http://klibido.sourceforge.net/
? http://www.sqlite.org/
? http://www.oracle.com/database/berkeley-db/index.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc2: kernel BUG at include/asm/dma-mapping.h:110!

2006-12-29 Thread Stefan Richter
Benjamin Herrenschmidt wrote:
>> Bisecting has identified this commit:
>> 
>> commit 9b7d9c096dd4e4baacc21b2588662bbb56f36c4e
>> Author: Stefan Richter <[EMAIL PROTECTED]>
>> Date:   Wed Nov 22 21:44:34 2006 +0100
>> 
>> ieee1394: sbp2: convert from PCI DMA to generic DMA
>> 
>> API conversion without change in functionality
>> 
>> Signed-off-by: Stefan Richter <[EMAIL PROTECTED]>
>> 
>> 
>> I'm only seeing this on ppc64, ppc32 seems to be working fine.
> 
> The patch looks totally bogus to me. It's passing a random struct device
> from the hbsp host data structure to the dma_map_* routines. which they
> can't do anything about.
[...]
> So if you are to pass a struct device pointer to dma_map_*, use the one
> inside the pci_dev of the host. Or have the host driver provide you with
> the struct device pointer (which is the one from the pci_dev * for PCI
> implementations, and others give you what they are on, assuming the
> platform can do dma-* on that device).
[...]

The parent device of my bogus fw-host device should do the trick. Alas I
can't test on ppc64 or with anything else than ohci1394 driven
controllers...


From: Stefan Richter <[EMAIL PROTECTED]>
Subject: ieee1394: sbp2: fix bogus dma mapping

Need to use a PCI device, not a FireWire host device.  Problem found by
Andreas Schwab, mistake pointed out by Benjamin Herrenschmidt.
http://ozlabs.org/pipermail/linuxppc-dev/2006-December/029595.html

Signed-off-by: Stefan Richter <[EMAIL PROTECTED]>
---
 drivers/ieee1394/sbp2.c |   73 +---
 1 file changed, 40 insertions(+), 33 deletions(-)

Index: linux-2.6.20-rc2/drivers/ieee1394/sbp2.c
===
--- linux-2.6.20-rc2.orig/drivers/ieee1394/sbp2.c
+++ linux-2.6.20-rc2/drivers/ieee1394/sbp2.c
@@ -490,11 +490,11 @@ static int sbp2util_create_command_orb_p
spin_unlock_irqrestore(&lu->cmd_orb_lock, flags);
return -ENOMEM;
}
-   cmd->command_orb_dma = dma_map_single(&hi->host->device,
+   cmd->command_orb_dma = dma_map_single(hi->host->device.parent,
&cmd->command_orb,
sizeof(struct sbp2_command_orb),
DMA_TO_DEVICE);
-   cmd->sge_dma = dma_map_single(&hi->host->device,
+   cmd->sge_dma = dma_map_single(hi->host->device.parent,
&cmd->scatter_gather_element,
sizeof(cmd->scatter_gather_element),
DMA_BIDIRECTIONAL);
@@ -516,10 +516,11 @@ static void sbp2util_remove_command_orb_
if (!list_empty(&lu->cmd_orb_completed))
list_for_each_safe(lh, next, &lu->cmd_orb_completed) {
cmd = list_entry(lh, struct sbp2_command_info, list);
-   dma_unmap_single(&host->device, cmd->command_orb_dma,
+   dma_unmap_single(host->device.parent,
+cmd->command_orb_dma,
 sizeof(struct sbp2_command_orb),
 DMA_TO_DEVICE);
-   dma_unmap_single(&host->device, cmd->sge_dma,
+   dma_unmap_single(host->device.parent, cmd->sge_dma,
 sizeof(cmd->scatter_gather_element),
 DMA_BIDIRECTIONAL);
kfree(cmd);
@@ -601,17 +602,17 @@ static void sbp2util_mark_command_comple
 
if (cmd->cmd_dma) {
if (cmd->dma_type == CMD_DMA_SINGLE)
-   dma_unmap_single(&host->device, cmd->cmd_dma,
+   dma_unmap_single(host->device.parent, cmd->cmd_dma,
 cmd->dma_size, cmd->dma_dir);
else if (cmd->dma_type == CMD_DMA_PAGE)
-   dma_unmap_page(&host->device, cmd->cmd_dma,
+   dma_unmap_page(host->device.parent, cmd->cmd_dma,
   cmd->dma_size, cmd->dma_dir);
/* XXX: Check for CMD_DMA_NONE bug */
cmd->dma_type = CMD_DMA_NONE;
cmd->cmd_dma = 0;
}
if (cmd->sge_buffer) {
-   dma_unmap_sg(&host->device, cmd->sge_buffer,
+   dma_unmap_sg(host->device.parent, cmd->sge_buffer,
 cmd->dma_size, cmd->dma_dir);
cmd->sge_buffer = NULL;
}
@@ -836,37 +837,37 @@ static int sbp2_start_device(struct sbp2
struct sbp2_fwhost_info *hi = lu->hi;
int error;
 
-   lu->login_response = dma_alloc_coherent(&hi->host->device,
+   lu->login_response = dma_alloc_coherent(hi->host->device.parent,
   

Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)

2006-12-29 Thread Linus Torvalds


On Fri, 29 Dec 2006, Andrew Morton wrote:
> 
> - The above change means that we do extra writeout.  If a page is dirtied
>   once, kjournald will write it and then pdflush will come along and
>   needlessly write it again.

There's zero extra writeout for any flushing that flushes BY PAGES.

Only broken flushers that flush by buffer heads (which really really 
really shouldn't be done any more: welcome to the 21st century) will cause 
extra writeouts. And those extra writeouts are obviously required for all 
the dirty state to actually hit the disk - which is the point of the 
patch.

So they're not "extra" - they are "required for correct working".

But I can't stress the fact enough that people SHOULD NOT do writeback by 
buffer heads. The buffer head has been purely an "IO entity" for the last 
several years now, and it's not a cache entity. Anybody who does writeback 
by buffer heads is basically bypassing the real cache (the page cache), 
and that's why all the problems happen.

I think ext3 is terminally crap by now. It still uses buffer heads in 
places where it really really shouldn't, and as a result, things like 
directory accesses are simply slower than they should be. Sadly, I don't 
think ext4 is going to fix any of this, either.

It's all just too inherently wrongly designed around the buffer head 
(which was correct in 1995, but hasn't been correct for a long time in the 
kernel any more).

> - Poor old IO accounting broke again.

No. That's why I used "set_page_dirty()" and did it that strange ugly way 
("set page dirty, even though it's already dirty, and even though the very 
next thing we will do is TestClearPageDirty???").

That code looks strange as a result, which is why it now has more comments 
on it than actual code ;)

> - People were saying that ext2 and ext3,data=writeback were also showing
>   corruption.  What's up with that?

I thought the "ext3,data=writeback" case was reported to be fine by 
several people?

I'm not sure about ext2. I didn't look at what it did based on buffer 
heads. I would have expected it to be ok.

That said, at least one report was later shown to be bogus (errors due to 
out of disk, not due to actual errors ;).

> - For a long time I've wanted to nuke the current ext3/jbd ordered-data
>   implementation altogether, and just make kjournald call into the
>   standard writeback code to do a standard suberblock->inodes->pages walk.

I really would like to see less of the buffer-head-based stuff, and yes, 
more of the normal inode page walking. I don't think you can "order" 
accesses within a page anyway, exactly because of memory mapping issues, 
so any page ordering is not about buffer heads on the page itself, it 
should be purely about metadata.

> - It's pretty obnoxious that the VM now sets a clean page "dirty" and
>   then proceeds to modify its contents.  It would be nice to stop doing
>   that.

No. I think this really the fundamental confusion people had. People 
thought that setting the page dirty meant that it was no longer being 
modified. It hasn't meant that in a LONG time - ever since the whole 
DIRTY_TAG thing, the most important part of the PG_dirty thing has really 
been that it's now efficiently findable by the writeout logic.

And that is very much what the whole page accounting _depends_ on. When we 
mmap a page, we need to mark it "findable" as dirty _before_ people 
actually start writing to it, because it's too late afterwards.

>   We could stop marking the page dirty in do_wp_page() and create a new
>   VM counter "NR_PTE_DIRTY", which means
> 
> "number of mapping_cap_account_dirty() pages which have a dirty pte
> pointing at them".

Well, then you need to change what PAGE_MAPPING_TAG_DIRTY means too.

That's very fundamental. That DIRTY _tag_ is now even more important than 
the PG_dirty bit itself, since that's what we actually use to _access_ 
those things.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)

2006-12-29 Thread Andrew Morton
On Fri, 29 Dec 2006 14:16:32 -0800
Andrew Morton <[EMAIL PROTECTED]> wrote:

> - Poor old IO accounting broke again.

No it didn't - we're relying upon the behaviour of __set_page_dirty_buffers()
against an already-dirty page.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] remove MAX_ARG_PAGES

2006-12-29 Thread Ingo Molnar

[Cc:-ed Ulrich too]

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> On Fri, 29 Dec 2006, Russell King wrote:
> > 
> > Suggest you test (eg) a rebuild of libX11 to see how it reacts to 
> > this patch.
> 
> Also: please rebuild "xargs" and install first. Otherwise, a lot of 
> build script etc that use "xargs" won't ever trigger the new limits 
> (or lack thereof), because xargs will have been installed with some 
> old limits.

yeah, and i think the default chunking of xargs should still remain 
128K.

If it's fine for a script to get chunked input, and if the script has no 
security relevance (xargs is fundamentally unsafe if any portion of the 
VFS namespace it gets used is untrusted), then there's no problem for 
the xargs limit to stay at 128K.

> Perhaps more worrying is if compiling xargs under a new kernel then 
> means that it won't work correctly under an old one.

xargs has its limit hardcoded AFAICS, it's based on:

#define ARG_MAX   131072/* # bytes of args + environ for exec() */

i'd not change that just yet. The sysconf(3) manpage says it's generally 
unreliable:

  BUGS
   It is difficult to use ARG_MAX because it is not specified how much  of
   the  argument  space  for  exec() is consumed by the user's environment
   variables.

but ... as it is with every limit, it is always possible to write an 
application that hardcodes a larger limit and then doesnt work when 
running with the lower limit. Would that have been a correct argument 
against say raising the user stack limit from the historic 1MB?

right now some of my (more stupid) scripts occasionally break if any 
random portion of my VFS namespace grows over the silly 128K limit. (and 
it rarely has the tendency to shrink, sadly) I think that is just as 
much of a legitimate problem as any naive newly written script not 
working on an older kernel on a huge VFS namespace. (in fact i could 
argue for it to be a more legitimate problem than other stupid scripts 
not being backwards compatible, not the least because it is a problem 
with /my/ scripts ;-)

we could try something like adding an ARG_MAX rlimit, but i think that 
would be overdoing it ... we could also do a sysctl as a global limit - 
equally pointless because distros will likely tweak it up anyway, and in 
any case neither measure really prevents the writing of stupid scripts.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: No sound in KDE with intel hda since 2.6.20-rc1

2006-12-29 Thread Ismail Dönmez
29 Ara 2006 Cum 09:18 tarihinde, Michael S. Tsirkin şunları yazmıştı: 
> > Since 2.6.20-rc1 (tested both -rc1 and rc2), system notification sounds
> > under KDE, and sound in games (e.g. TuxPaint) no longer seem to work on
> > my T60 thinkpad. Works fine under 2.6.19 though.  The strange thing is
> > e.g. Amarok still plays music fine.
>
> Tis is on Kubuntu 6.06, BTW.

Same on Pardus 2007 which uses KDE 3.5.5.

Regards,
ismail

-- 
2 + 2 = 5 for very large values of 2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)

2006-12-29 Thread Andrew Morton
On Fri, 29 Dec 2006 02:48:35 -0800 (PST)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> + if (mapping && mapping_cap_account_dirty(mapping)) {
> + /*
> +  * Yes, Virginia, this is indeed insane.
> +  *
> +  * We use this sequence to make sure that
> +  *  (a) we account for dirty stats properly
> +  *  (b) we tell the low-level filesystem to
> +  *  mark the whole page dirty if it was
> +  *  dirty in a pagetable. Only to then
> +  *  (c) clean the page again and return 1 to
> +  *  cause the writeback.
> +  *
> +  * This way we avoid all nasty races with the
> +  * dirty bit in multiple places and clearing
> +  * them concurrently from different threads.
> +  *
> +  * Note! Normally the "set_page_dirty(page)"
> +  * has no effect on the actual dirty bit - since
> +  * that will already usually be set. But we
> +  * need the side effects, and it can help us
> +  * avoid races.
> +  *
> +  * We basically use the page "master dirty bit"
> +  * as a serialization point for all the different
> +  * threds doing their things.
> +  *
> +  * FIXME! We still have a race here: if somebody
> +  * adds the page back to the page tables in
> +  * between the "page_mkclean()" and the "TestClearPageDirty()",
> +  * we might have it mapped without the dirty bit set.
> +  */
> + if (page_mkclean(page))
> + set_page_dirty(page);
> + if (TestClearPageDirty(page)) {
>   dec_zone_page_state(page, NR_FILE_DIRTY);
> + return 1;
>   }

- Presumably reiser3's ordered-data mode has the same problem.  And ext4,
  of course.  Dunno about other filesytems.

- The above change means that we do extra writeout.  If a page is dirtied
  once, kjournald will write it and then pdflush will come along and
  needlessly write it again.

  But otoh, if a mapping is being repeatedly dirtied, kjournald will
  write the page once per 30 seconds (dirty_expire_centisecs) and pdflush
  will write the page once per 30 seconds as well.  But we _should_ be
  writing it once per five seconds (kjournald commit interval).  So we're
  still ahead ;)

- Poor old IO accounting broke again.

- People were saying that ext2 and ext3,data=writeback were also showing
  corruption.  What's up with that?

- For a long time I've wanted to nuke the current ext3/jbd ordered-data
  implementation altogether, and just make kjournald call into the
  standard writeback code to do a standard suberblock->inodes->pages walk.

  I think it'd be fairly straightforward to do.  We'd need to teach the
  writeback code to be able to skip dirty pages which don't have a disk
  mapping, so that kjournald doesn't end up waiting for kjournald to free
  up journal space..

  Would need to avoid possible deadlocks where someone calls
  ext3_force_commit() or otherwise does a synchronous commit while holding
  VFS locks.

  reiser3 and ext4 could be converted too.

  Not a short-term project, but this would avoid the problem.

- It's pretty obnoxious that the VM now sets a clean page "dirty" and
  then proceeds to modify its contents.  It would be nice to stop doing
  that.

  We could stop marking the page dirty in do_wp_page() and create a new
  VM counter "NR_PTE_DIRTY", which means

"number of mapping_cap_account_dirty() pages which have a dirty pte
pointing at them".

  Or, perhaps

"number of dirty ptes which point at mapping_cap_account_dirty() pages".

  Which can be larger, but the writeout code will probably cope.

  Then we take NR_PTE_DIRTY into account in the dirty-page balancing act.
  So

  - do_wp_page() will still run balance_dirty_pages()

  - but it would no longer run set_page_dirty().

  - But it needs to run mark_inode_dirty() so the fs-writeback code
notices the file.

  - And mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) becomes insufficient.

  The tricky part here is "how do we do the writeback"?  The
  pte-dirty,!PageDirty pages aren't tagged as dirty in the radix-tree and
  writeback needs to find them so that it can effectively do an msync() on
  them.  Walking all the mm's and vma's would be insane.  Visiting all the
  pages in the file would also probably be insane.

  Perhaps this can be solved by adding a new radix-tree tag which means
  "this page might have dirty ptes pointing at it".  For each file
  writeback would do a radix-tree walk of these pages,
  cleaning-and-write-protecting ptes, marking the corresponding pages
  dirty and clearing their PAGECACHE_TAG_PTE_DIRTY tags.

  Then we can fix the mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)
  problem by doing

mapping_tagg

Re: Three if-clauses of constant logic value; char drivers for kernel 2.4.33.3

2006-12-29 Thread Willy Tarreau
On Tue, Dec 19, 2006 at 09:59:37PM +0100, Mats Erik Andersson wrote:
> Hi there, all masters of kernel code,

Hi !

> I just discovered that the kernel code for 2.4.33.3 contains three
> if-statements that never can change their values, whence they should
> be repaired or eliminated. In source directory linux/drivers/char the
> files vt.c and keyboard.c produce these warning upon compilation:
> 
> vt.c:166: varning: comparison is always false due to limited range  
>   of data type
> vt.c:289: varning: comparison is always false due to limited range
>   of data type
> keyboard.c:640: varning: comparison is always true due to limited
> range of data type
> 
> I did the compilation with gcc 3.3.5 on Debian Sarge. This behaviour
> appeared first for kernel 2.2.19, since I wanted to revive the old
> minirtl edition, but to my surprise the same warnings appear also
> with the brand new kernel 2.4.33.3.

OK thanks for reporting this. I'll take a look at those before next
release.

BTW, sorry, I missed your post. When posting 2.4-related mails, please
try to put the "2.4" word close to the beginning of the subject so that
my eyes can notice it in the middle of the 1 other montly messages
on LKML.

> Best regards
>  Mats Erik Andersson, PhD
>  [EMAIL PROTECTED]
>  [EMAIL PROTECTED]

best regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: setup apm as module version 2.4.34

2006-12-29 Thread Willy Tarreau
Hi,

On Fri, Dec 29, 2006 at 07:28:55PM +0100, Dr.-Ing. Ingo D. Rullhusen wrote:
> Hello,
> 
> i hope that's the right address for this little problem, which arises 
> with linux kernel 2.4.34.

Yes, it's the right address.

> If i compile the Advanced Power Management as module it do not work. If 
> i try a depmod i get an unresolved symbols message and so it cannot be 
> loaded of course.
> 
> But if the APM part is compiled into the kernel directly it works.
> 
> Simply disable the compile as module option?

I'm sorry, but could you be a bit more precise : config, error messages ?
Also, is this problem a regression (ie: did it work on a known previous
version, and if so, which one) ?

> Thanks
>   Ingo

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ondemand governor use new cpufreq rwsem locking in work callback

2006-12-29 Thread Venkatesh Pallipadi


Eliminate flush_workqueue in cpufreq_governor(STOP) callpath. Using flush
there has a deadlock potential as in 

http://uwsg.iu.edu/hypermail/linux/kernel/0611.3/1223.html

Also, cleanup the locking issues with do_dbs_timer delayed_work callback.
As it changes the CPU frequency using __cpufreq_target, it needs to have
policy_rwsem in write mode, which also protects it from hot plug.

Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]>

Index: linux-2.6.20-rc-mm/drivers/cpufreq/cpufreq_ondemand.c
===
--- linux-2.6.20-rc-mm.orig/drivers/cpufreq/cpufreq_ondemand.c
+++ linux-2.6.20-rc-mm/drivers/cpufreq/cpufreq_ondemand.c
@@ -437,8 +437,14 @@ static void do_dbs_timer(struct work_str
 
delay -= jiffies % delay;
 
-   if (!dbs_info->enable)
+   if (lock_policy_rwsem_write(cpu) < 0)
return;
+
+   if (!dbs_info->enable) {
+   unlock_policy_rwsem_write(cpu);
+   return;
+   }
+
/* Common NORMAL_SAMPLE setup */
dbs_info->sample_type = DBS_NORMAL_SAMPLE;
if (!dbs_tuners_ins.powersave_bias ||
@@ -455,6 +461,7 @@ static void do_dbs_timer(struct work_str
CPUFREQ_RELATION_H);
}
queue_delayed_work_on(cpu, kondemand_wq, &dbs_info->work, delay);
+   unlock_policy_rwsem_write(cpu);
 }
 
 static inline void dbs_timer_init(struct cpu_dbs_info_s *dbs_info)
@@ -463,6 +470,7 @@ static inline void dbs_timer_init(struct
int delay = usecs_to_jiffies(dbs_tuners_ins.sampling_rate);
delay -= jiffies % delay;
 
+   dbs_info->enable = 1;
ondemand_powersave_bias_init();
dbs_info->sample_type = DBS_NORMAL_SAMPLE;
INIT_DELAYED_WORK_NAR(&dbs_info->work, do_dbs_timer);
@@ -474,7 +482,6 @@ static inline void dbs_timer_exit(struct
 {
dbs_info->enable = 0;
cancel_delayed_work(&dbs_info->work);
-   flush_workqueue(kondemand_wq);
 }
 
 static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
@@ -503,21 +510,9 @@ static int cpufreq_governor_dbs(struct c
 
mutex_lock(&dbs_mutex);
dbs_enable++;
-   if (dbs_enable == 1) {
-   kondemand_wq = create_workqueue("kondemand");
-   if (!kondemand_wq) {
-   printk(KERN_ERR
-"Creation of kondemand failed\n");
-   dbs_enable--;
-   mutex_unlock(&dbs_mutex);
-   return -ENOSPC;
-   }
-   }
 
rc = sysfs_create_group(&policy->kobj, &dbs_attr_group);
if (rc) {
-   if (dbs_enable == 1)
-   destroy_workqueue(kondemand_wq);
dbs_enable--;
mutex_unlock(&dbs_mutex);
return rc;
@@ -532,7 +527,6 @@ static int cpufreq_governor_dbs(struct c
j_dbs_info->prev_cpu_wall = get_jiffies_64();
}
this_dbs_info->cpu = cpu;
-   this_dbs_info->enable = 1;
/*
 * Start the timerschedule work, when this governor
 * is used for first time
@@ -562,9 +556,6 @@ static int cpufreq_governor_dbs(struct c
dbs_timer_exit(this_dbs_info);
sysfs_remove_group(&policy->kobj, &dbs_attr_group);
dbs_enable--;
-   if (dbs_enable == 0)
-   destroy_workqueue(kondemand_wq);
-
mutex_unlock(&dbs_mutex);
 
break;
@@ -593,12 +584,18 @@ static struct cpufreq_governor cpufreq_g
 
 static int __init cpufreq_gov_dbs_init(void)
 {
+   kondemand_wq = create_workqueue("kondemand");
+   if (!kondemand_wq) {
+   printk(KERN_ERR "Creation of kondemand failed\n");
+   return -EFAULT;
+   }
return cpufreq_register_governor(&cpufreq_gov_dbs);
 }
 
 static void __exit cpufreq_gov_dbs_exit(void)
 {
cpufreq_unregister_governor(&cpufreq_gov_dbs);
+   destroy_workqueue(kondemand_wq);
 }
 
 
@@ -610,3 +607,4 @@ MODULE_LICENSE("GPL");
 
 module_init(cpufreq_gov_dbs_init);
 module_exit(cpufreq_gov_dbs_exit);
+
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ondemand governor restructure the work callback

2006-12-29 Thread Venkatesh Pallipadi

Restructure the delayed_work callback in ondemand.

This eliminates the need for smp_processor_id in the callback function and also
helps in proper locking and avoiding flush_workqueue when stopping the governor
(done in subsequent patch).

Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]>

Index: linux-2.6.20-rc-mm/drivers/cpufreq/cpufreq_ondemand.c
===
--- linux-2.6.20-rc-mm.orig/drivers/cpufreq/cpufreq_ondemand.c
+++ linux-2.6.20-rc-mm/drivers/cpufreq/cpufreq_ondemand.c
@@ -52,19 +52,20 @@ static unsigned int def_sampling_rate;
 static void do_dbs_timer(struct work_struct *work);
 
 /* Sampling types */
-enum dbs_sample {DBS_NORMAL_SAMPLE, DBS_SUB_SAMPLE};
+enum {DBS_NORMAL_SAMPLE, DBS_SUB_SAMPLE};
 
 struct cpu_dbs_info_s {
cputime64_t prev_cpu_idle;
cputime64_t prev_cpu_wall;
struct cpufreq_policy *cur_policy;
struct delayed_work work;
-   enum dbs_sample sample_type;
-   unsigned int enable;
struct cpufreq_frequency_table *freq_table;
unsigned int freq_lo;
unsigned int freq_lo_jiffies;
unsigned int freq_hi_jiffies;
+   int cpu;
+   unsigned int enable:1,
+sample_type:1;
 };
 static DEFINE_PER_CPU(struct cpu_dbs_info_s, cpu_dbs_info);
 
@@ -402,7 +403,7 @@ static void dbs_check_cpu(struct cpu_dbs
if (load < (dbs_tuners_ins.up_threshold - 10)) {
unsigned int freq_next, freq_cur;
 
-   freq_cur = cpufreq_driver_getavg(policy);
+   freq_cur = __cpufreq_driver_getavg(policy);
if (!freq_cur)
freq_cur = policy->cur;
 
@@ -423,9 +424,11 @@ static void dbs_check_cpu(struct cpu_dbs
 
 static void do_dbs_timer(struct work_struct *work)
 {
-   unsigned int cpu = smp_processor_id();
-   struct cpu_dbs_info_s *dbs_info = &per_cpu(cpu_dbs_info, cpu);
-   enum dbs_sample sample_type = dbs_info->sample_type;
+   struct cpu_dbs_info_s *dbs_info =
+   container_of(work, struct cpu_dbs_info_s, work.work);
+   unsigned int cpu = dbs_info->cpu;
+   int sample_type = dbs_info->sample_type;
+
/* We want all CPUs to do sampling nearly on same jiffy */
int delay = usecs_to_jiffies(dbs_tuners_ins.sampling_rate);
 
@@ -454,17 +457,17 @@ static void do_dbs_timer(struct work_str
queue_delayed_work_on(cpu, kondemand_wq, &dbs_info->work, delay);
 }
 
-static inline void dbs_timer_init(unsigned int cpu)
+static inline void dbs_timer_init(struct cpu_dbs_info_s *dbs_info)
 {
-   struct cpu_dbs_info_s *dbs_info = &per_cpu(cpu_dbs_info, cpu);
/* We want all CPUs to do sampling nearly on same jiffy */
int delay = usecs_to_jiffies(dbs_tuners_ins.sampling_rate);
delay -= jiffies % delay;
 
ondemand_powersave_bias_init();
-   INIT_DELAYED_WORK_NAR(&dbs_info->work, do_dbs_timer);
dbs_info->sample_type = DBS_NORMAL_SAMPLE;
-   queue_delayed_work_on(cpu, kondemand_wq, &dbs_info->work, delay);
+   INIT_DELAYED_WORK_NAR(&dbs_info->work, do_dbs_timer);
+   queue_delayed_work_on(dbs_info->cpu, kondemand_wq, &dbs_info->work,
+ delay);
 }
 
 static inline void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)
@@ -528,6 +531,7 @@ static int cpufreq_governor_dbs(struct c
j_dbs_info->prev_cpu_idle = get_cpu_idle_time(j);
j_dbs_info->prev_cpu_wall = get_jiffies_64();
}
+   this_dbs_info->cpu = cpu;
this_dbs_info->enable = 1;
/*
 * Start the timerschedule work, when this governor
@@ -548,7 +552,7 @@ static int cpufreq_governor_dbs(struct c
 
dbs_tuners_ins.sampling_rate = def_sampling_rate;
}
-   dbs_timer_init(policy->cpu);
+   dbs_timer_init(this_dbs_info);
 
mutex_unlock(&dbs_mutex);
break;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Rewrite lock in cpufreq to eliminate cpufreq/hotplug related issues

2006-12-29 Thread Venkatesh Pallipadi


Yet another attempt to resolve cpufreq and hotplug locking issues.

Patchset has 3 patches:
* Rewrite the lock infrastructure of cpufreq using a per cpu rwsem.
* Minor restructuring of work callback in ondemand driver.
* Use the new cpufreq rwsem infrastructure in ondemand work.

This patch:

Convert policy->lock to rwsem and move it to per_cpu area.
This rwsem will protect against both changing/accessing policy
related parameters and CPU hot plug/unplug.

Cc: Gautham R Shenoy <[EMAIL PROTECTED]>
Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]>

Index: linux-2.6.20-rc-mm/drivers/cpufreq/cpufreq.c
===
--- linux-2.6.20-rc-mm.orig/drivers/cpufreq/cpufreq.c
+++ linux-2.6.20-rc-mm/drivers/cpufreq/cpufreq.c
@@ -41,6 +41,64 @@ static struct cpufreq_driver *cpufreq_dr
 static struct cpufreq_policy *cpufreq_cpu_data[NR_CPUS];
 static DEFINE_SPINLOCK(cpufreq_driver_lock);
 
+/*
+ * cpu_policy_rwsem is a per CPU reader-writer semaphore designed to cure
+ * all cpufreq/hotplug/workqueue/etc related lock issues.
+ *
+ * The rules for this semaphore:
+ * - Any routine that wants to read from the policy structure will
+ *   do a down_read on this semaphore.
+ * - Any routine that will write to the policy structure and/or may take away
+ *   the policy altogether (eg. CPU hotplug), will hold this lock in write
+ *   mode before doing so.
+ *
+ * Additional rules:
+ * - All holders of the lock should check to make sure that the CPU they
+ *   are concerned with are online after they get the lock.
+ * - Governor routines that can be called in cpufreq hotplug path should not
+ *   take this sem as top level hotplug notifier handler takes this.
+ */
+static DEFINE_PER_CPU(int, policy_cpu);
+static DEFINE_PER_CPU(struct rw_semaphore, cpu_policy_rwsem);
+
+#define lock_policy_rwsem(mode, cpu)   \
+int lock_policy_rwsem_##mode   \
+(int cpu)  \
+{  \
+   int policy_cpu = per_cpu(policy_cpu, cpu);  \
+   BUG_ON(policy_cpu == -1);   \
+   down_##mode(&per_cpu(cpu_policy_rwsem, policy_cpu));\
+   if (unlikely(!cpu_online(cpu))) {   \
+   up_##mode(&per_cpu(cpu_policy_rwsem, policy_cpu));  \
+   return -1;  \
+   }   \
+   \
+   return 0;   \
+}
+
+lock_policy_rwsem(read, cpu);
+EXPORT_SYMBOL_GPL(lock_policy_rwsem_read);
+
+lock_policy_rwsem(write, cpu);
+EXPORT_SYMBOL_GPL(lock_policy_rwsem_write);
+
+void unlock_policy_rwsem_read(int cpu)
+{
+   int policy_cpu = per_cpu(policy_cpu, cpu);
+   BUG_ON(policy_cpu == -1);
+   up_read(&per_cpu(cpu_policy_rwsem, policy_cpu));
+}
+EXPORT_SYMBOL_GPL(unlock_policy_rwsem_read);
+
+void unlock_policy_rwsem_write(int cpu)
+{
+   int policy_cpu = per_cpu(policy_cpu, cpu);
+   BUG_ON(policy_cpu == -1);
+   up_write(&per_cpu(cpu_policy_rwsem, policy_cpu));
+}
+EXPORT_SYMBOL_GPL(unlock_policy_rwsem_write);
+
+
 /* internal prototypes */
 static int __cpufreq_governor(struct cpufreq_policy *policy, unsigned int 
event);
 static void handle_update(struct work_struct *work);
@@ -415,10 +473,8 @@ static ssize_t store_##file_name   
\
if (ret != 1)   \
return -EINVAL; \
\
-   mutex_lock(&policy->lock);  \
ret = __cpufreq_set_policy(policy, &new_policy);\
policy->user_policy.object = policy->object;\
-   mutex_unlock(&policy->lock);\
\
return ret ? ret : count;   \
 }
@@ -479,12 +535,10 @@ static ssize_t store_scaling_governor (s
 
/* Do not use cpufreq_set_policy here or the user_policy.max
   will be wrongly overridden */
-   mutex_lock(&policy->lock);
ret = __cpufreq_set_policy(policy, &new_policy);
 
policy->user_policy.policy = policy->policy;
policy->user_policy.governor = policy->governor;
-   mutex_unlock(&policy->lock);
 
if (ret)
return ret;
@@ -589,11 +643,17 @@ static ssize_t show(struct kobject * kob
policy = cpufreq_cpu_get(policy->cpu);
if (!policy)
return -EINVAL;
+
+   if

[PATCH] kobject: kobj->k_name verification fix

2006-12-29 Thread Martin Stoilov
The function 'kobject_add' tries to verify the name of
a new kobject instance is properly set before continuing.
if (!kobj->k_name)
kobj->k_name = kobj->name;
if (!kobj->k_name) {
pr_debug("kobject attempted to be registered with no name!\n");
WARN_ON(1);
return -EINVAL;
}
The statement:
if (!kobj->k_name) {
pr_debug("kobject attempted to be registered with no name!\n");
WARN_ON(1);
return -EINVAL;
}
is useless the way it is right now, because it can never be true. I
think the
code was intended to be:
if (!kobj->k_name)
kobj->k_name = kobj->name;
if (!*kobj->k_name) {
pr_debug("kobject attempted to be registered with no name!\n");
WARN_ON(1);
return -EINVAL;
}
because this would make sure the kobj->name buffer has something in it.
So the missing '*' is just a typo. Although, I would much prefer
expression like:
if (*kobj->k_name == '\0') {
pr_debug("kobject attempted to be registered with no name!\n");
WARN_ON(1);
return -EINVAL;
}
because this would've made the intention clear, in this patch
I just restore the missing '*' without changing the coding style of
the function.

It looks like thunderbird client replaces the tabs with spaces even if I say
'paste without formatting'. Don't know how to insert the patch intact in
the
body of the message. Attaching the patch.

Signed-off-by: Martin Stoilov <[EMAIL PROTECTED]>
---


diff -pNru linux-2.6.20-rc2/lib/kobject.c linux-2.6.20-rc2.mod/lib/kobject.c
--- linux-2.6.20-rc2/lib/kobject.c	2006-12-29 11:48:30.0 -0800
+++ linux-2.6.20-rc2.mod/lib/kobject.c	2006-12-29 11:50:42.0 -0800
@@ -167,7 +167,7 @@ int kobject_add(struct kobject * kobj)
 		return -ENOENT;
 	if (!kobj->k_name)
 		kobj->k_name = kobj->name;
-	if (!kobj->k_name) {
+	if (!*kobj->k_name) {
 		pr_debug("kobject attempted to be registered with no name!\n");
 		WARN_ON(1);
 		return -EINVAL;


Re: [patch] remove MAX_ARG_PAGES

2006-12-29 Thread Ingo Molnar

* Russell King <[EMAIL PROTECTED]> wrote:

> On Fri, Dec 29, 2006 at 09:03:57PM +0100, Ingo Molnar wrote:
> > FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to 
> > 2.6.20-rc2 and have included it in the -rt kernel. It's working great - 
> > i can now finally do a "ls -t patches/*.patch" in my patch repository - 
> > something i havent been able to do for years ;-)
> 
> How do the various autoconf stuff react to this?  Eg, I notice the 
> following in various configure scripts:
> 
> checking the maximum length of command line arguments... 32768

yes, that's how libtool works, it goes from 32K downwards to figure out 
a maximum. I dont see a problem there.

you can find a few other variants at:

  
http://www.google.com/codesearch?q=%22checking+the+maximum+length+of+command+line+arguments%22&hl=en&btnG=Search+Code

worst-case the test-command would get a segfault from the default stack 
limit. (8MB on Fedora)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc2: kernel BUG at include/asm/dma-mapping.h:110!

2006-12-29 Thread Benjamin Herrenschmidt

> Bisecting has identified this commit:
> 
> commit 9b7d9c096dd4e4baacc21b2588662bbb56f36c4e
> Author: Stefan Richter <[EMAIL PROTECTED]>
> Date:   Wed Nov 22 21:44:34 2006 +0100
> 
> ieee1394: sbp2: convert from PCI DMA to generic DMA
> 
> API conversion without change in functionality
> 
> Signed-off-by: Stefan Richter <[EMAIL PROTECTED]>
> 
> 
> I'm only seeing this on ppc64, ppc32 seems to be working fine.

The patch looks totally bogus to me. It's passing a random struct device
from the hbsp host data structure to the dma_map_* routines. which they
can't do anything about.

The dma_map_* routines only know about some bus types. That's always
been the case (that's why you also can't pass a usb device's struct
device to them for example). Mostly, PCI, possibly others depending on
the platform.

So if you are to pass a struct device pointer to dma_map_*, use the one
inside the pci_dev of the host. Or have the host driver provide you with
the struct device pointer (which is the one from the pci_dev * for PCI
implementations, and others give you what they are on, assuming the
platform can do dma-* on that device).

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] Handle error in sync_sb_inodes()

2006-12-29 Thread Guillaume Chazarain

Against 2.6.20-rc2, and now the bug fix.

--
Guillaume

I/O errors could go unnoticed when syncing, for example the following code could
write a file bigger than 10Mib on a 10Mib filesystem. With this patch, msync()
will report the error originally encountered by sync(). Tuning the number of
sync may be needed to reproduce the bug.
make_file.c:

#include 
#include 
#include 
#include 
#include 

#define NR_SYNC 3 /* Adjust me if needed */
#define SIZE ((10 << 20) + (100 << 10))

int main(void)
{
	int i, fd;
	char *mapping;
	fd = open("mnt/file", O_RDWR | O_CREAT, 0600);
	if (fd < 0) {
		perror("open");
		return 1;
	}

	if (ftruncate(fd, SIZE) < 0) {
		perror("ftruncate");
		return 1;
	}

	mapping = mmap(NULL, SIZE, PROT_WRITE, MAP_SHARED, fd, 0);
	if (mapping == MAP_FAILED) {
		perror("mmap");
		return 1;
	}

	memset(mapping, 0xFF, SIZE);

	for (i = 0; i < NR_SYNC; i++)
		sync();

	if (msync(mapping, SIZE, MS_SYNC) < 0) {
		perror("msync");
		return 1;
	}

	if (close(fd) < 0) {
		perror("close");
		return 1;
	}

	puts("File written successfully => bad!\n");
	return 0;
}

#!/bin/sh

dd if=/dev/zero of=fs.10M bs=10M count=0 seek=1
mkfs.ext2 -qF fs.10M
mkdir mnt
mount fs.10M mnt -o loop
./make_file

Signed-off-by: Guillaume Chazarain <[EMAIL PROTECTED]>
---

 fs-writeback.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff -r 3859b1144d3a fs/fs-writeback.c
--- a/fs/fs-writeback.c	Sun Dec 24 05:00:03 2006 +
+++ b/fs/fs-writeback.c	Fri Dec 29 22:12:42 2006 +0100
@@ -316,6 +316,7 @@ sync_sb_inodes(struct super_block *sb, s
 		struct address_space *mapping = inode->i_mapping;
 		struct backing_dev_info *bdi = mapping->backing_dev_info;
 		long pages_skipped;
+		int ret;
 
 		if (!bdi_cap_writeback_dirty(bdi)) {
 			list_move(&inode->i_list, &sb->s_dirty);
@@ -365,7 +366,8 @@ sync_sb_inodes(struct super_block *sb, s
 		BUG_ON(inode->i_state & I_FREEING);
 		__iget(inode);
 		pages_skipped = wbc->pages_skipped;
-		__writeback_single_inode(inode, wbc);
+		ret = __writeback_single_inode(inode, wbc);
+		mapping_set_error(mapping, ret);
 		if (wbc->sync_mode == WB_SYNC_HOLD) {
 			inode->dirtied_when = jiffies;
 			list_move(&inode->i_list, &sb->s_dirty);


Re: PROBLEM: 2.6.19 + highmem = BUG at do_wp_page

2006-12-29 Thread Sami Farin
On Tue, Dec 12, 2006 at 15:10:56 -0500, Chuck Ebbert wrote:
> In-Reply-To: <[EMAIL PROTECTED]>
> 
> On Tue, 5 Dec 2006 19:25:13 +0200, Sami Farin wrote:
> 
> > BUG: unable to handle kernel paging request at virtual address fffb9dc0
> 
> > eax: fffb8000   ebx: fffb9000   ecx: 0090   edx: 
> > esi: fffb9dc0   edi: fffb8dc0   ebp: f6f89f24   esp: f6f89ef0
> 
>   1f:   89 de mov%ebx,%esi
>   21:   b9 00 04 00 00mov$0x400,%ecx
>   26:   89 45 cc  mov%eax,0xffcc(%ebp)
>   29:   89 c7 mov%eax,%edi
> 
>0:   f3 a5 repz movsl %ds:(%esi),%es:(%edi)   <=
> 
> Processor started to copy a page, then with 576 bytes left to copy
> the source page got unmapped.  Nice.
> 
> This possibly happened during a device interrupt. What does
> /proc/interrupts say?

My system now works with HIGHMEM (got 103 MB extra mem for use).

I Fixed™ this by nuking crypto API out of the fortuna patch
and including my own functions...

$ cat /proc/sys/kernel/random/cipher_algo 
Snuffle 2005

hehe..

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] Factor outstanding I/O error handling

2006-12-29 Thread Guillaume Chazarain

Against 2.6.20-rc2, first some cleanup, then a bug fix.

--
Guillaume

Cleanup: setting an outstanding error on a mapping was open coded too
many times. Factor it out in mapping_set_error().

Signed-off-by: Guillaume Chazarain <[EMAIL PROTECTED]>
---

 fs/gfs2/glops.c |5 +
 fs/mpage.c  |   16 ++--
 include/linux/pagemap.h |   10 ++
 mm/page-writeback.c |7 +--
 mm/vmscan.c |8 ++--
 5 files changed, 16 insertions(+), 30 deletions(-)

diff -r 3859b1144d3a fs/gfs2/glops.c
--- a/fs/gfs2/glops.c	Sun Dec 24 05:00:03 2006 +
+++ b/fs/gfs2/glops.c	Fri Dec 29 22:13:21 2006 +0100
@@ -213,10 +213,7 @@ static void inode_go_sync(struct gfs2_gl
 		if (ip) {
 			struct address_space *mapping = ip->i_inode.i_mapping;
 			int error = filemap_fdatawait(mapping);
-			if (error == -ENOSPC)
-set_bit(AS_ENOSPC, &mapping->flags);
-			else if (error)
-set_bit(AS_EIO, &mapping->flags);
+			mapping_set_error(mapping, error);
 		}
 		clear_bit(GLF_DIRTY, &gl->gl_flags);
 		gfs2_ail_empty_gl(gl);
diff -r 3859b1144d3a fs/mpage.c
--- a/fs/mpage.c	Sun Dec 24 05:00:03 2006 +
+++ b/fs/mpage.c	Fri Dec 29 22:12:09 2006 +0100
@@ -663,12 +663,7 @@ confused:
 	/*
 	 * The caller has a ref on the inode, so *mapping is stable
 	 */
-	if (*ret) {
-		if (*ret == -ENOSPC)
-			set_bit(AS_ENOSPC, &mapping->flags);
-		else
-			set_bit(AS_EIO, &mapping->flags);
-	}
+	mapping_set_error(mapping, *ret);
 out:
 	return bio;
 }
@@ -776,14 +771,7 @@ retry:
 
 			if (writepage) {
 ret = (*writepage)(page, wbc);
-if (ret) {
-	if (ret == -ENOSPC)
-		set_bit(AS_ENOSPC,
-			&mapping->flags);
-	else
-		set_bit(AS_EIO,
-			&mapping->flags);
-}
+mapping_set_error(mapping, ret);
 			} else {
 bio = __mpage_writepage(bio, page, get_block,
 		&last_block_in_bio, &ret, wbc,
diff -r 3859b1144d3a include/linux/pagemap.h
--- a/include/linux/pagemap.h	Sun Dec 24 05:00:03 2006 +
+++ b/include/linux/pagemap.h	Fri Dec 29 22:09:01 2006 +0100
@@ -19,6 +19,16 @@
 #define	AS_EIO		(__GFP_BITS_SHIFT + 0)	/* IO error on async write */
 #define AS_ENOSPC	(__GFP_BITS_SHIFT + 1)	/* ENOSPC on async write */
 
+static inline void mapping_set_error(struct address_space * mapping, int error)
+{
+	if (error) {
+		if (error == -ENOSPC)
+			set_bit(AS_ENOSPC, &mapping->flags);
+		else
+			set_bit(AS_EIO, &mapping->flags);
+	}
+}
+
 static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
 {
 	return (__force gfp_t)mapping->flags & __GFP_BITS_MASK;
diff -r 3859b1144d3a mm/page-writeback.c
--- a/mm/page-writeback.c	Sun Dec 24 05:00:03 2006 +
+++ b/mm/page-writeback.c	Fri Dec 29 22:19:30 2006 +0100
@@ -651,12 +651,7 @@ retry:
 			}
 
 			ret = (*writepage)(page, wbc);
-			if (ret) {
-if (ret == -ENOSPC)
-	set_bit(AS_ENOSPC, &mapping->flags);
-else
-	set_bit(AS_EIO, &mapping->flags);
-			}
+			mapping_set_error(mapping, ret);
 
 			if (unlikely(ret == AOP_WRITEPAGE_ACTIVATE))
 unlock_page(page);
diff -r 3859b1144d3a mm/vmscan.c
--- a/mm/vmscan.c	Sun Dec 24 05:00:03 2006 +
+++ b/mm/vmscan.c	Fri Dec 29 22:14:33 2006 +0100
@@ -284,12 +284,8 @@ static void handle_write_error(struct ad
 struct page *page, int error)
 {
 	lock_page(page);
-	if (page_mapping(page) == mapping) {
-		if (error == -ENOSPC)
-			set_bit(AS_ENOSPC, &mapping->flags);
-		else
-			set_bit(AS_EIO, &mapping->flags);
-	}
+	if (page_mapping(page) == mapping)
+		mapping_set_error(mapping, error);
 	unlock_page(page);
 }
 


Re: [Patch] scsi: megaraid_{mm,mbox}: init fix for kdump

2006-12-29 Thread Randy Dunlap
On Fri, 29 Dec 2006 08:02:17 -0800 Sumant Patro wrote:

See Documentation/SubmittingPatches:
Please include output of "diffstat -p1 -w70" so that we can easily see
the scope of the changes.

and see Documentation/CodingStyle for comments below:


> diff -uprN linux-2.6.orig/drivers/scsi/megaraid/megaraid_mbox.c 
> linux-2.6.new/drivers/scsi/megaraid/megaraid_mbox.c
> --- linux-2.6.orig/drivers/scsi/megaraid/megaraid_mbox.c 2006-12-28 
> 09:56:04.0 -0800
> +++ linux-2.6.new/drivers/scsi/megaraid/megaraid_mbox.c 2006-12-29 
> 05:31:48.0 -0800
> @@ -779,6 +780,22 @@ megaraid_init_mbox(adapter_t *adapter)
>   goto out_release_regions;
>   }
>  
> + // initialize the mutual exclusion lock for the mailbox
> + spin_lock_init(&raid_dev->mailbox_lock);

Linux uses /*...*/ C89-style comments, not // C99 comments.

> + // allocate memory required for commands
> + if (megaraid_alloc_cmd_packets(adapter) != 0) {
> + goto out_iounmap;
> + }
> +
> + /*
> +  * Issue SYNC cmd to flush the pending cmds in the adapter
> +  * and initialize its internal state
> +  */
> +
> + if (megaraid_mbox_fire_sync_cmd(adapter))
> + con_log(CL_ANN, ("megaraid: sync cmd failed\n"));
> +

>   // Product info
>   if (megaraid_mbox_product_info(adapter) != 0) {
> - goto out_alloc_cmds;
> + goto out_free_irq;

Don't uses {} braces around 1-statement "blocks".

> @@ -875,7 +883,7 @@ megaraid_init_mbox(adapter_t *adapter)
>* accessed
>*/
>   if (megaraid_sysfs_alloc_resources(adapter) != 0) {
> - goto out_alloc_cmds;
> + goto out_free_irq;

Ditto.

>   }
>  
>   // Set the DMA mask to 64-bit. All supported controllers as capable of
> @@ -3380,6 +3388,86 @@ megaraid_mbox_flush_cache(adapter_t *ada
>  
>  
>  /**
> + * megaraid_mbox_fire_sync_cmd - fire the sync cmd
> + * @param adapter: soft state for the controller
> + */
> +static int
> +megaraid_mbox_fire_sync_cmd(adapter_t *adapter)
> +{
> + mbox_t  *mbox;
> + uint8_t raw_mbox[sizeof(mbox_t)];
> + mraid_device_t  *raid_dev = ADAP2RAIDDEV(adapter);
> + mbox64_t *mbox64;
> + uint8_t status = 0;
> + int i;
> + uint32_t dword;
> +
> + mbox = (mbox_t *)raw_mbox;
> +
> + memset((caddr_t)raw_mbox, 0, sizeof(mbox_t));
> +
> + raw_mbox[0] = 0xFF;
> +
> + mbox64  = raid_dev->mbox64;
> + mbox= raid_dev->mbox;
> +
> + /*
> +  * Wait until mailbox is free
> +  */
> + if (megaraid_busywait_mbox(raid_dev) != 0) {
> + status = 1;
> + goto blocked_mailbox;
> + }
> +
> + /*
> +  * Copy mailbox data into host structure
> +  */
> + memcpy((caddr_t)mbox, (caddr_t)raw_mbox, 16);
> + mbox->cmdid = 0xFE;
> + mbox->busy  = 1;
> + mbox->poll  = 0;
> + mbox->ack   = 0;
> + mbox->numstatus = 0;
> + mbox->status= 0;
> +
> + wmb();
> + WRINDOOR(raid_dev, raid_dev->mbox_dma | 0x1);
> +
> + // wait for maximum 1 min for status to post.
> + // If the Firmware SUPPORTS the ABOVE COMMAND,
> + // mbox->cmd will be set to 0
> + // else
> + // the firmware will reject the command with
> + // mbox->numstatus set to 1

Don't use // comment style.  Also, for multi-line comments
in Linux, please use this preferred style:

/*
 * This is the preferred style for multi-line
 * comments in the Linux kernel source code.
 * Please use it consistently.
 *
 * Description:  A column of asterisks on the left side,
 * with beginning and ending almost-blank lines.
 */

Thanks.
---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch] scsi: megaraid_{mm,mbox}: init fix for kdump

2006-12-29 Thread Sumant Patro

- Changes in Initialization to fix kdump failure.
Without this fix, megaraid driver either panics or fails to
initialize the adapter during the kdump's 2nd kernel boot
if there are pending commands or interrupts from other devices
sharing the same IRQ.
Fix:Send SYNC command on loading.
This command clears the pending commands in the adapter
and re-initialize its internal RAID structure.

Signed-Off By: Sumant Patro <[EMAIL PROTECTED]>

diff -uprN linux-2.6.orig/Documentation/scsi/ChangeLog.megaraid 
linux-2.6.new/Documentation/scsi/ChangeLog.megaraid
--- linux-2.6.orig/Documentation/scsi/ChangeLog.megaraid 2006-12-28 
10:10:31.0 -0800
+++ linux-2.6.new/Documentation/scsi/ChangeLog.megaraid 2006-12-29 
06:57:44.0 -0800
@@ -1,3 +1,16 @@
+Release Date : Thu Nov 16 15:32:35 EST 2006 - Sumant Patro <[EMAIL PROTECTED]>
+Current Version : 2.20.5.1 (scsi module), 2.20.2.6 (cmm module)
+Older Version : 2.20.4.9 (scsi module), 2.20.2.6 (cmm module)
+
+1. Changes in Initialization to fix kdump failure.
+Without this fix, megaraid driver either panics or fails to
+initialize the adapter during the kdump's 2nd kernel boot
+if there are pending commands or interrupts from other devices
+sharing the same IRQ.
+  - Send SYNC command on loading.
+This command clears the pending commands in the adapter
+and re-initialize its internal RAID structure.
+
 Release Date : Fri May 19 09:31:45 EST 2006 - Seokmann Ju <[EMAIL PROTECTED]>
 Current Version : 2.20.4.9 (scsi module), 2.20.2.6 (cmm module)
 Older Version : 2.20.4.8 (scsi module), 2.20.2.6 (cmm module)
diff -uprN linux-2.6.orig/drivers/scsi/megaraid/megaraid_mbox.c 
linux-2.6.new/drivers/scsi/megaraid/megaraid_mbox.c
--- linux-2.6.orig/drivers/scsi/megaraid/megaraid_mbox.c 2006-12-28 
09:56:04.0 -0800
+++ linux-2.6.new/drivers/scsi/megaraid/megaraid_mbox.c 2006-12-29 
05:31:48.0 -0800
@@ -10,13 +10,13 @@
  *2 of the License, or (at your option) any later version.
  *
  * FILE: megaraid_mbox.c
- * Version : v2.20.4.9 (Jul 16 2006)
+ * Version : v2.20.5.1 (Nov 16 2006)
  *
  * Authors:
- * Atul Mukker <[EMAIL PROTECTED]>
- * Sreenivas Bagalkote <[EMAIL PROTECTED]>
- * Manoj Jose  <[EMAIL PROTECTED]>
- * Seokmann Ju <[EMAIL PROTECTED]>
+ * Atul Mukker <[EMAIL PROTECTED]>
+ * Sreenivas Bagalkote <[EMAIL PROTECTED]>
+ * Manoj Jose  <[EMAIL PROTECTED]>
+ * Seokmann Ju <[EMAIL PROTECTED]>
  *
  * List of supported controllers
  *
@@ -107,6 +107,7 @@ static int megaraid_mbox_support_random_
 static int megaraid_mbox_get_max_sg(adapter_t *);
 static void megaraid_mbox_enum_raid_scsi(adapter_t *);
 static void megaraid_mbox_flush_cache(adapter_t *);
+static int megaraid_mbox_fire_sync_cmd(adapter_t *);
 
 static void megaraid_mbox_display_scb(adapter_t *, scb_t *);
 static void megaraid_mbox_setup_device_map(adapter_t *);
@@ -137,7 +138,7 @@ static int wait_till_fw_empty(adapter_t 
 
 
 
-MODULE_AUTHOR("[EMAIL PROTECTED]");
+MODULE_AUTHOR("[EMAIL PROTECTED]");
 MODULE_DESCRIPTION("LSI Logic MegaRAID Mailbox Driver");
 MODULE_LICENSE("GPL");
 MODULE_VERSION(MEGARAID_VERSION);
@@ -779,6 +780,22 @@ megaraid_init_mbox(adapter_t *adapter)
goto out_release_regions;
}
 
+   // initialize the mutual exclusion lock for the mailbox
+   spin_lock_init(&raid_dev->mailbox_lock);
+
+   // allocate memory required for commands
+   if (megaraid_alloc_cmd_packets(adapter) != 0) {
+   goto out_iounmap;
+   }
+
+   /*
+* Issue SYNC cmd to flush the pending cmds in the adapter
+* and initialize its internal state
+*/
+
+   if (megaraid_mbox_fire_sync_cmd(adapter))
+   con_log(CL_ANN, ("megaraid: sync cmd failed\n"));
+
//
// Setup the rest of the soft state using the library of FW routines
//
@@ -789,22 +806,13 @@ megaraid_init_mbox(adapter_t *adapter)
 
con_log(CL_ANN, (KERN_WARNING
"megaraid: Couldn't register IRQ %d!\n", adapter->irq));
+   goto out_alloc_cmds;
 
-   goto out_iounmap;
-   }
-
-
-   // initialize the mutual exclusion lock for the mailbox
-   spin_lock_init(&raid_dev->mailbox_lock);
-
-   // allocate memory required for commands
-   if (megaraid_alloc_cmd_packets(adapter) != 0) {
-   goto out_free_irq;
}
 
// Product info
if (megaraid_mbox_product_info(adapter) != 0) {
-   goto out_alloc_cmds;
+   goto out_free_irq;
}
 
// Do we support extended CDBs
@@ -875,7 +883,7 @@ megaraid_init_mbox(adapter_t *adapter)
 * accessed
 */
if (megaraid_sysfs_alloc_resources(adapter) != 0) {
-   goto out_alloc_cmds;
+   goto out_free_irq;
}
 
// Set the DMA mask to 64-bit. All 

Re: [patch] remove MAX_ARG_PAGES

2006-12-29 Thread Linus Torvalds


On Fri, 29 Dec 2006, Russell King wrote:
> 
> Suggest you test (eg) a rebuild of libX11 to see how it reacts to
> this patch.

Also: please rebuild "xargs" and install first. Otherwise, a lot of 
build script etc that use "xargs" won't ever trigger the new limits (or 
lack thereof), because xargs will have been installed with some old 
limits.

Perhaps more worrying is if compiling xargs under a new kernel then means 
that it won't work correctly under an old one.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch] scsi:megaraid_sas:Update module author

2006-12-29 Thread Sumant Patro
Update domain name change from lsil.com to lsi.com.
Change module author to [EMAIL PROTECTED]

Signed-Off By: Sumant Patro <[EMAIL PROTECTED]>

diff -uprN linux-2.6.orig/drivers/scsi/megaraid/megaraid_sas.c 
linux-2.6.new/drivers/scsi/megaraid/megaraid_sas.c
--- linux-2.6.orig/drivers/scsi/megaraid/megaraid_sas.c 2006-12-28 
09:56:05.0 -0800
+++ linux-2.6.new/drivers/scsi/megaraid/megaraid_sas.c 2006-12-29 
07:17:21.0 -0800
@@ -13,8 +13,8 @@
  * Version : v00.00.03.05
  *
  * Authors:
- *  Sreenivas Bagalkote <[EMAIL PROTECTED]>
- *  Sumant Patro  <[EMAIL PROTECTED]>
+ *  Sreenivas Bagalkote <[EMAIL PROTECTED]>
+ *  Sumant Patro  <[EMAIL PROTECTED]>
  *
  * List of supported controllers
  *
@@ -45,7 +45,7 @@
 
 MODULE_LICENSE("GPL");
 MODULE_VERSION(MEGASAS_VERSION);
-MODULE_AUTHOR("[EMAIL PROTECTED]");
+MODULE_AUTHOR("[EMAIL PROTECTED]");
 MODULE_DESCRIPTION("LSI Logic MegaRAID SAS Driver");
 
 /*

diff -uprN linux-2.6.orig/drivers/scsi/megaraid/megaraid_sas.c linux-2.6.new/drivers/scsi/megaraid/megaraid_sas.c
--- linux-2.6.orig/drivers/scsi/megaraid/megaraid_sas.c	2006-12-28 09:56:05.0 -0800
+++ linux-2.6.new/drivers/scsi/megaraid/megaraid_sas.c	2006-12-29 07:17:21.0 -0800
@@ -13,8 +13,8 @@
  * Version	: v00.00.03.05
  *
  * Authors:
- * 	Sreenivas Bagalkote	<[EMAIL PROTECTED]>
- * 	Sumant Patro		<[EMAIL PROTECTED]>
+ * 	Sreenivas Bagalkote	<[EMAIL PROTECTED]>
+ * 	Sumant Patro		<[EMAIL PROTECTED]>
  *
  * List of supported controllers
  *
@@ -45,7 +45,7 @@
 
 MODULE_LICENSE("GPL");
 MODULE_VERSION(MEGASAS_VERSION);
-MODULE_AUTHOR("[EMAIL PROTECTED]");
+MODULE_AUTHOR("[EMAIL PROTECTED]");
 MODULE_DESCRIPTION("LSI Logic MegaRAID SAS Driver");
 
 /*


Re: PROBLEM: setup apm as module version 2.4.34

2006-12-29 Thread bert hubert
On Fri, Dec 29, 2006 at 07:28:55PM +0100, Dr.-Ing. Ingo D. Rullhusen wrote:
> i hope that's the right address for this little problem, which arises 
> with linux kernel 2.4.34.
> 
> If i compile the Advanced Power Management as module it do not work. If 
> i try a depmod i get an unresolved symbols message and so it cannot be 
> loaded of course.

If you mention the exact unresolved symbols, people might be able to help
you better.

Bert

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -rt] scheduling while atomic in remove_proc_entry()

2006-12-29 Thread Daniel Walker
Found this in 2.6.20-rc2-rt3 w/ PREEMPT_RT off on x86_64,

BUG: scheduling while atomic: swapper/0x0001/1, CPU#0

Call Trace:
 [] __sched_text_start+0xb0/0x85a
 [] try_to_wake_up+0x3fc/0x420
 [] add_preempt_count+0x2b/0x130
 [] schedule+0xe5/0x110
 [] flush_cpu_workqueue+0x8d/0xd0
 [] autoremove_wake_function+0x0/0x40
 [] filevec_add_drain_per_cpu+0x0/0x80
 [] flush_workqueue+0x73/0xa0
 [] schedule_on_each_cpu_wq+0xea/0x110
 [] add_preempt_count+0x2b/0x130
 [] filevec_add_drain_all+0x17/0x20
 [] remove_proc_entry+0xb0/0x230
 [] unregister_handler_proc+0x2d/0x60
 [] free_irq+0xfc/0x150
 [] i8042_probe+0x30d/0x610
 [] platform_drv_probe+0x12/0x20
 [] really_probe+0x9b/0x140
 [] driver_probe_device+0xb8/0xd0
 [] __device_attach+0x0/0x10
 [] __device_attach+0x9/0x10
 [] bus_for_each_drv+0x4c/0x90
 [] device_attach+0x6f/0x90
 [] bus_attach_device+0x2e/0x70
 [] device_add+0x3d8/0x5b0
 [] platform_device_add+0x13f/0x180
 [] i8042_init+0x72/0xb0
 [] init+0x172/0x3c0
 [] child_rip+0xa/0x12
 [] init+0x0/0x3c0
 [] child_rip+0x0/0x12

---
| preempt count: 0001 ]
| 1-level deep critical section nesting:

. []  __spin_lock+0x16/0x80
[] ..   ( <= remove_proc_entry+0x4b/0x230)


Signed-Off-By: Daniel Walker <[EMAIL PROTECTED]>

---
 fs/proc/generic.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6.19/fs/proc/generic.c
===
--- linux-2.6.19.orig/fs/proc/generic.c
+++ linux-2.6.19/fs/proc/generic.c
@@ -555,7 +555,6 @@ static void proc_kill_inodes(struct proc
/*
 * Actually it's a partial revoke().
 */
-   filevec_add_drain_all();
lock_list_for_each_entry(filp, &sb->s_files, f_u.fu_llist) {
struct dentry * dentry = filp->f_path.dentry;
struct inode * inode;
@@ -738,6 +737,8 @@ void remove_proc_entry(const char *name,
break;
}
spin_unlock(&proc_subdir_lock);
+
+   filevec_add_drain_all();
 out:
return;
 }
--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OpenSolaris under KVM?

2006-12-29 Thread Parag Warudkar
Avi Kivity  argo.co.il> writes:

> 
> John Freighter wrote:
> > Has anybody succeded running OpenSolaris under KVM virtualization?
> > Before I download OS install DVD in vain...
> >
> 
> There was indeed a report (and a patch) from Michael Riepe to that 
> effect.  -rc2 should contain that patch.  Please report to kvm-devel if 
> it doesn't work.
> 


I tried installing Solaris 10 U2 with Qemu/KVM-8 on -rc2 plus 
the latest 8 kernel side KVM patches. 
It appeared to work well until about 80% in the installation 
where it got stuck after this error is dmesg -

vmwrite error: reg 6802 value 1c334000 (err 17408)
This was on a Core Duo Mac Mini.

Parag

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] Char: mxser_new, fix twice resource releasing

2006-12-29 Thread Jiri Slaby
mxser_new, fix twice resource releasing

Because brd->info is not NULLed, resources are released twice. NULL it in
pci_remove function. Also take care of retval and releasing in pci_probe --
mxser_initbrd alreasy releases resource, do not do it again in fail path in
probe function.

Cc: Sergei Organov <[EMAIL PROTECTED]>
Signed-off-by: Jiri Slaby <[EMAIL PROTECTED]>

---
commit 549237a65498ad3880cd1ca40f23f8bc942041cb
tree 8208eb0eb881aa6bd1532c90a60c72009415e3e1
parent 5065aa25fd624e3477d993baebbf3255a1d492fa
author Jiri Slaby <[EMAIL PROTECTED]> Fri, 29 Dec 2006 21:38:56 +0059
committer Jiri Slaby <[EMAIL PROTECTED]> Fri, 29 Dec 2006 21:38:56 +0059

 drivers/char/mxser_new.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/char/mxser_new.c b/drivers/char/mxser_new.c
index 042d138..f078ddf 100644
--- a/drivers/char/mxser_new.c
+++ b/drivers/char/mxser_new.c
@@ -2403,9 +2403,8 @@ static int __devinit mxser_initbrd(struct mxser_board 
*brd,
brd->info->name, brd->irq);
/* We hold resources, we need to release them. */
mxser_release_res(brd, pdev, 0);
-   return retval;
}
-   return 0;
+   return retval;
 }
 
 static int __init mxser_get_ISA_conf(int cap, struct mxser_board *brd)
@@ -2590,8 +2589,9 @@ static int __devinit mxser_probe(struct pci_dev *pdev,
}
 
/* mxser_initbrd will hook ISR. */
-   if (mxser_initbrd(brd, pdev) < 0)
-   goto err_relvec;
+   retval = mxser_initbrd(brd, pdev);
+   if (retval)
+   goto err_null;
 
for (i = 0; i < brd->info->nports; i++)
tty_register_device(mxvar_sdriver, brd->idx + i, &pdev->dev);
@@ -2599,10 +2599,9 @@ static int __devinit mxser_probe(struct pci_dev *pdev,
pci_set_drvdata(pdev, brd);
 
return 0;
-err_relvec:
-   pci_release_region(pdev, 3);
 err_relio:
pci_release_region(pdev, 2);
+err_null:
brd->info = NULL;
 err:
return retval;
@@ -2620,6 +2619,7 @@ static void __devexit mxser_remove(struct pci_dev *pdev)
tty_unregister_device(mxvar_sdriver, brd->idx + i);
 
mxser_release_res(brd, pdev, 1);
+   brd->info = NULL;
 }
 
 static struct pci_driver mxser_driver = {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/4] Char: mxser_new, less loops in isr

2006-12-29 Thread Jiri Slaby
mxser_new, less loops in isr

Loop only 100^2 times, not 9^2 times in isr (at most).

Signed-off-by: Jiri Slaby <[EMAIL PROTECTED]>

---
commit 5065aa25fd624e3477d993baebbf3255a1d492fa
tree a4b05ea113ceea8b8ad1382fa3a5778473597d0f
parent cc46acb974ba967794f7b199fb65ad4abd9531b7
author Jiri Slaby <[EMAIL PROTECTED]> Fri, 29 Dec 2006 21:07:10 +0059
committer Jiri Slaby <[EMAIL PROTECTED]> Fri, 29 Dec 2006 21:07:10 +0059

 drivers/char/mxser_new.c |9 +++--
 1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/char/mxser_new.c b/drivers/char/mxser_new.c
index 945c7e1..042d138 100644
--- a/drivers/char/mxser_new.c
+++ b/drivers/char/mxser_new.c
@@ -56,7 +56,7 @@
 #define MXSER_BOARDS   4   /* Max. boards */
 #define MXSER_PORTS_PER_BOARD  8   /* Max. ports per board */
 #define MXSER_PORTS(MXSER_BOARDS * MXSER_PORTS_PER_BOARD)
-#define MXSER_ISR_PASS_LIMIT   9L
+#define MXSER_ISR_PASS_LIMIT   100
 
 #defineMXSER_ERR_IOADDR-1
 #defineMXSER_ERR_IRQ   -2
@@ -,8 +,7 @@ static irqreturn_t mxser_interrupt(int irq, void *dev_id)
struct mxser_board *brd = NULL;
struct mxser_port *port;
int max, irqbits, bits, msr;
-   int pass_counter = 0;
-   unsigned int int_cnt;
+   unsigned int int_cnt, pass_counter = 0;
int handled = IRQ_NONE;
 
for (i = 0; i < MXSER_BOARDS; i++)
@@ -2237,7 +2236,7 @@ static irqreturn_t mxser_interrupt(int irq, void *dev_id)
if (brd == NULL)
goto irq_stop;
max = brd->info->nports;
-   while (1) {
+   while (pass_counter++ < MXSER_ISR_PASS_LIMIT) {
irqbits = inb(brd->vector) & brd->vector_mask;
if (irqbits == brd->vector_mask)
break;
@@ -2308,8 +2307,6 @@ static irqreturn_t mxser_interrupt(int irq, void *dev_id)
} while (int_cnt++ < MXSER_ISR_PASS_LIMIT);
spin_unlock(&port->slock);
}
-   if (pass_counter++ > MXSER_ISR_PASS_LIMIT)
-   break;  /* Prevent infinite loops */
}
 
 irq_stop:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/4] Char: mxser_new, header file cleanup

2006-12-29 Thread Jiri Slaby
mxser_new, header file cleanup

- Remove no longer used macros
- Move some macros from the header to the code
- Remove c++ comments
- Align backslashes to one column

Signed-off-by: Jiri Slaby <[EMAIL PROTECTED]>

---
commit cc46acb974ba967794f7b199fb65ad4abd9531b7
tree 56be354d64287e9b79013b8d8d35fd6ca4dc0d49
parent 15e7e157283a86bb819a0193a8cb137d7640d3a6
author Jiri Slaby <[EMAIL PROTECTED]> Fri, 29 Dec 2006 20:57:07 +0059
committer Jiri Slaby <[EMAIL PROTECTED]> Fri, 29 Dec 2006 20:57:07 +0059

 drivers/char/mxser_new.c |7 -
 drivers/char/mxser_new.h |  461 +++---
 2 files changed, 154 insertions(+), 314 deletions(-)

diff --git a/drivers/char/mxser_new.c b/drivers/char/mxser_new.c
index ec61cf8..945c7e1 100644
--- a/drivers/char/mxser_new.c
+++ b/drivers/char/mxser_new.c
@@ -53,8 +53,6 @@
 #defineMXSERMAJOR   174
 #defineMXSERCUMAJOR 175
 
-#defineMXSER_EVENT_TXLOW   1
-
 #define MXSER_BOARDS   4   /* Max. boards */
 #define MXSER_PORTS_PER_BOARD  8   /* Max. ports per board */
 #define MXSER_PORTS(MXSER_BOARDS * MXSER_PORTS_PER_BOARD)
@@ -65,6 +63,11 @@
 #defineMXSER_ERR_IRQ_CONFLIT   -3
 #defineMXSER_ERR_VECTOR-4
 
+/*CheckIsMoxaMust return value*/
+#define MOXA_OTHER_UART0x00
+#define MOXA_MUST_MU150_HWID   0x01
+#define MOXA_MUST_MU860_HWID   0x02
+
 #define WAKEUP_CHARS   256
 
 #define UART_MCR_AFE   0x20
diff --git a/drivers/char/mxser_new.h b/drivers/char/mxser_new.h
index 55b34a0..04fa5fc 100644
--- a/drivers/char/mxser_new.h
+++ b/drivers/char/mxser_new.h
@@ -26,18 +26,8 @@
 #define RS422_MODE 2
 #define RS485_4WIRE_MODE   3
 #define OP_MODE_MASK   3
-// above add by Victor Yu. 01-05-2004
-
-#define TTY_THRESHOLD_THROTTLE  128
-
-#define LO_WATER   (TTY_FLIPBUF_SIZE)
-#define HI_WATER   (TTY_FLIPBUF_SIZE*2*3/4)
-
-// added by James. 03-11-2004.
-#define MOXA_SDS_GETICOUNTER   (MOXA + 68)
-#define MOXA_SDS_RSTICOUNTER   (MOXA + 69)
-// (above) added by James.
 
+#define MOXA_SDS_RSTICOUNTER   (MOXA + 69)
 #define MOXA_ASPP_OQUEUE   (MOXA + 70)
 #define MOXA_ASPP_SETBAUD  (MOXA + 71)
 #define MOXA_ASPP_GETBAUD  (MOXA + 72)
@@ -46,7 +36,6 @@
 #define MOXA_ASPP_MON_EXT  (MOXA + 75)
 #define MOXA_SET_BAUD_METHOD   (MOXA + 76)
 
-
 /* --- */
 
 #define NPPI_NOTIFY_PARITY 0x01
@@ -55,51 +44,46 @@
 #define NPPI_NOTIFY_SW_OVERRUN 0x08
 #define NPPI_NOTIFY_BREAK  0x10
 
-#define NPPI_NOTIFY_CTSHOLD 0x01   // Tx hold by CTS low
-#define NPPI_NOTIFY_DSRHOLD 0x02   // Tx hold by DSR low
-#define NPPI_NOTIFY_XOFFHOLD0x08   // Tx hold by Xoff received
-#define NPPI_NOTIFY_XOFFXENT0x10   // Xoff Sent
-
-//CheckIsMoxaMust return value
-#define MOXA_OTHER_UART0x00
-#define MOXA_MUST_MU150_HWID   0x01
-#define MOXA_MUST_MU860_HWID   0x02
-
-// follow just for Moxa Must chip define.
-//
-// when LCR register (offset 0x03) write following value,
-// the Must chip will enter enchance mode. And write value
-// on EFR (offset 0x02) bit 6,7 to change bank.
+#define NPPI_NOTIFY_CTSHOLD 0x01   /* Tx hold by CTS low */
+#define NPPI_NOTIFY_DSRHOLD 0x02   /* Tx hold by DSR low */
+#define NPPI_NOTIFY_XOFFHOLD0x08   /* Tx hold by Xoff received */
+#define NPPI_NOTIFY_XOFFXENT0x10   /* Xoff Sent */
+
+/* follow just for Moxa Must chip define. */
+/* */
+/* when LCR register (offset 0x03) write following value, */
+/* the Must chip will enter enchance mode. And write value */
+/* on EFR (offset 0x02) bit 6,7 to change bank. */
 #define MOXA_MUST_ENTER_ENCHANCE   0xBF
 
-// when enhance mode enable, access on general bank register
+/* when enhance mode enable, access on general bank register */
 #define MOXA_MUST_GDL_REGISTER 0x07
 #define MOXA_MUST_GDL_MASK 0x7F
 #define MOXA_MUST_GDL_HAS_BAD_DATA 0x80
 
-#define MOXA_MUST_LSR_RERR 0x80// error in receive FIFO
-// enchance register bank select and enchance mode setting register
-// when LCR register equal to 0xBF
+#define MOXA_MUST_LSR_RERR 0x80/* error in receive FIFO */
+/* enchance register bank select and enchance mode setting register */
+/* when LCR register equal to 0xBF */
 #define MOXA_MUST_EFR_REGISTER 0x02
-// enchance mode enable
+/* enchance mode enable */
 #define MOXA_MUST_EFR_EFRB_ENABLE  0x10
-// enchance reister bank set 0, 1, 2
+/* enchance reister bank set 0, 1, 2 */
 #define MOXA_MUST_EFR_BANK00x00
 #define MOXA_MUST_EFR_BANK10x40
 #define MOXA_MUST_EFR_BANK20x80
 #define MOXA_MUST_EFR_BANK30xC0
 #define MOXA_MUST_EFR_BANK_MASK0xC0
 
-// set XON1 value register, when LCR=0xBF and change to bank0
+/* set XON

[PATCH 1/4] Char: mxser_new, alter locking in isr

2006-12-29 Thread Jiri Slaby
mxser_new, alter locking in isr

Avoid oopsing when stress-testing open/close -- port->tty is NULL
sometimes, but is expected to be non-NULL, since dereferencing.
Receive/transmit chars iff ASYNC_CLOSING is not set and ASYNC_INITIALIZED
is set. Thanks Sergei for pointing this out and testing.

Cc: Sergei Organov <[EMAIL PROTECTED]>
Signed-off-by: Jiri Slaby <[EMAIL PROTECTED]>

---
commit 15e7e157283a86bb819a0193a8cb137d7640d3a6
tree c0d4130f898c835c4e283af0e343ee504345d4c0
parent ab35af25a3d01f1e07fc8de5b96f484b93a8ad2a
author Jiri Slaby <[EMAIL PROTECTED]> Fri, 29 Dec 2006 20:00:21 +0059
committer Jiri Slaby <[EMAIL PROTECTED]> Fri, 29 Dec 2006 20:00:21 +0059

 drivers/char/mxser_new.c  |   22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/char/mxser_new.c b/drivers/char/mxser_new.c
index 8da8833..ec61cf8 100644
--- a/drivers/char/mxser_new.c
+++ b/drivers/char/mxser_new.c
@@ -2073,9 +2073,6 @@ static void mxser_receive_chars(struct mxser_port *port, 
int *status)
int cnt = 0;
int recv_room;
int max = 256;
-   unsigned long flags;
-
-   spin_lock_irqsave(&port->slock, flags);
 
recv_room = tty->receive_room;
if ((recv_room == 0) && (!port->ldisc_stop_rx))
@@ -2159,7 +2156,6 @@ end_intr:
mxvar_log.rxcnt[port->tty->index] += cnt;
port->mon_data.rxcnt += cnt;
port->mon_data.up_rxcnt += cnt;
-   spin_unlock_irqrestore(&port->slock, flags);
 
tty_flip_buffer_push(tty);
 }
@@ -2167,9 +2163,6 @@ end_intr:
 static void mxser_transmit_chars(struct mxser_port *port)
 {
int count, cnt;
-   unsigned long flags;
-
-   spin_lock_irqsave(&port->slock, flags);
 
if (port->x_char) {
outb(port->x_char, port->ioaddr + UART_TX);
@@ -2178,11 +2171,11 @@ static void mxser_transmit_chars(struct mxser_port 
*port)
port->mon_data.txcnt++;
port->mon_data.up_txcnt++;
port->icount.tx++;
-   goto unlock;
+   return;
}
 
if (port->xmit_buf == 0)
-   goto unlock;
+   return;
 
if ((port->xmit_cnt <= 0) || port->tty->stopped ||
(port->tty->hw_stopped &&
@@ -2190,7 +2183,7 @@ static void mxser_transmit_chars(struct mxser_port *port)
(!port->board->chip_flag))) {
port->IER &= ~UART_IER_THRI;
outb(port->IER, port->ioaddr + UART_IER);
-   goto unlock;
+   return;
}
 
cnt = port->xmit_cnt;
@@ -2215,8 +2208,6 @@ static void mxser_transmit_chars(struct mxser_port *port)
port->IER &= ~UART_IER_THRI;
outb(port->IER, port->ioaddr + UART_IER);
}
-unlock:
-   spin_unlock_irqrestore(&port->slock, flags);
 }
 
 /*
@@ -2257,12 +2248,16 @@ static irqreturn_t mxser_interrupt(int irq, void 
*dev_id)
port = &brd->ports[i];
 
int_cnt = 0;
+   spin_lock(&port->slock);
do {
iir = inb(port->ioaddr + UART_IIR);
if (iir & UART_IIR_NO_INT)
break;
iir &= MOXA_MUST_IIR_MASK;
-   if (!port->tty) {
+   if (!port->tty ||
+   (port->flags & ASYNC_CLOSING) ||
+   !(port->flags &
+   ASYNC_INITIALIZED)) {
status = inb(port->ioaddr + UART_LSR);
outb(0x27, port->ioaddr + UART_FCR);
inb(port->ioaddr + UART_MSR);
@@ -2308,6 +2303,7 @@ static irqreturn_t mxser_interrupt(int irq, void *dev_id)
mxser_transmit_chars(port);
}
} while (int_cnt++ < MXSER_ISR_PASS_LIMIT);
+   spin_unlock(&port->slock);
}
if (pass_counter++ > MXSER_ISR_PASS_LIMIT)
break;  /* Prevent infinite loops */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] Doc: isicom, remove reserved ioctl-number

2006-12-29 Thread Jiri Slaby
isicom, remove reserved ioctl-number

Isicom driver no longer registers chardev with ioctl function. It used
to use for firmware loading. Remove the reserved letter (M) from
ioctl-number, so that the conflict get away.

Signed-off-by: Jiri Slaby <[EMAIL PROTECTED]>

---
commit 5bdb7cc0e955ee7724ff519a212aceb706e4814d
tree 27615584648d8776da636a527a16fed31ca51bca
parent 549237a65498ad3880cd1ca40f23f8bc942041cb
author Jiri Slaby <[EMAIL PROTECTED]> Fri, 29 Dec 2006 21:48:23 +0059
committer Jiri Slaby <[EMAIL PROTECTED]> Fri, 29 Dec 2006 21:48:23 +0059

 Documentation/ioctl-number.txt |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/Documentation/ioctl-number.txt b/Documentation/ioctl-number.txt
index 5a8bd5b..8f750c0 100644
--- a/Documentation/ioctl-number.txt
+++ b/Documentation/ioctl-number.txt
@@ -94,8 +94,7 @@ Code  Seq#Include FileComments
 'L'00-1F   linux/loop.h
 'L'E0-FF   linux/ppdd.hencrypted disk device driver


-'M'all linux/soundcard.h   conflict!
-'M'00-1F   linux/isicom.h  conflict!
+'M'all linux/soundcard.h
 'N'00-1F   drivers/usb/scanner.h
 'P'all linux/soundcard.h
 'Q'all linux/soundcard.h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kobject_add unreachable code

2006-12-29 Thread Olaf Dietsche
Martin Stoilov <[EMAIL PROTECTED]> writes:

> Martin Stoilov wrote:
>> Olaf Dietsche wrote:
>>   
>>> Martin Stoilov <[EMAIL PROTECTED]> writes:
>>>
>>>   
>>> 
 The following code in kobject_add
 if (!kobj->k_name)
 kobj->k_name = kobj->name;
 if (!kobj->k_name) {
 pr_debug("kobject attempted to be registered with no name!\n");
 WARN_ON(1);
 return -EINVAL;
 }

 doesn't look right to me. The second 'if' statement looks useless after
 the assignment in the first one. May be it was meant to be like:
 if (!*kobj->k_name)
 
   
>>> The second test is true, if kobj->name is NULL as well.
>>>   
>>> 
>> And how would that ever be true? kobj->name is a buffer inside kobj:
>>
>> struct kobject  {
>>  const char  * k_name;
>>  charname 
>> [KOBJ_NAME_LEN 
>> ];
>>
>> kobj->name will not be NULL, even if kobj itself is NULL.
>>   
>
> Oops, I am sorry for sending badly formated text! Here it is:
>
> I don't understand how would that ever be true? kobj->name is a buffer inside 
> kobj:
>
> struct kobject {
> const char  * k_name;
> charname[KOBJ_NAME_LEN];
>
> kobj->name will not be NULL, even if kobj itself is NULL.

Shame on me! I just looked at kobject_add() without a clue about struct
kobject. You're right, of course.

Regards, Olaf.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] remove MAX_ARG_PAGES

2006-12-29 Thread Russell King
On Fri, Dec 29, 2006 at 09:03:57PM +0100, Ingo Molnar wrote:
> FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to 
> 2.6.20-rc2 and have included it in the -rt kernel. It's working great - 
> i can now finally do a "ls -t patches/*.patch" in my patch repository - 
> something i havent been able to do for years ;-)

How do the various autoconf stuff react to this?  Eg, I notice the
following in various configure scripts:

checking the maximum length of command line arguments... 32768

Suggest you test (eg) a rebuild of libX11 to see how it reacts to
this patch.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


select() to /dev/rtc to wait for clock tick timed out

2006-12-29 Thread Ritesh Raj Sarraf
Hi,

I have a Dell XPS M1210 notebook with 2.6.18 running on it.

I've noticed that a recent BIOS upgrade (A03 -> A05) has started giving me this
message on system boot when hwclock runs.

select() to /dev/rtc to wait for clock tick timed out

My machine has a dual-core Intel processor.

A quick search revealed a workaround for hwclock to use the --directisa option.
Also this bug seems to be generic to dual-core Intel processors as many other
people also are facing the same issue on servers/desktops/laptops.

Thanks,
Ritesh
-- 
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."
"Stealing logic from one person is plagiarism, stealing from many is research."
"The great are those who achieve the impossible, the petty are those who
cannot - rrs"

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.20-rc1 6/6] input: ads7846 directly senses PENUP state

2006-12-29 Thread David Brownell
On Thursday 28 December 2006 10:22 pm, Dmitry Torokhov wrote:
> 
> I appied all patches except for hwmon as it had some issues with CONFIG_HWMON
> handling. Could you please take a look at the patch below and tell me if it
> works for you?

Looked OK, except:

> +#if defined(CONFIG_HWMON) || (defined(MODULE) && 
> defined(CONFIG_HWMON_MODULE))

That idiom is more usually written

#if defined(CONFIG_HWMON) || defined(CONFIG_HWMON_MODULE)

Thanks!  I'll be glad to see fewer versions of this driver floating around.
And to see the next version of the ads7843 patches ... :) 

- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc2: known unfixed regressions

2006-12-29 Thread Daniel Barkalow
On Fri, 29 Dec 2006, Adrian Bunk wrote:

> On Fri, Dec 29, 2006 at 01:14:13PM -0500, Daniel Barkalow wrote:
> 
> > There's also http://lkml.org/lkml/2006/12/21/47; the included patch break 
> > my nVidia devices and probably all PCIX devices, so it's not right, but 
> > something has to be done to fix ATI. My guess is a quirk to say that 
> > pci_intx doesn't work on certain devices and should just be skipped, but 
> > I'm not sure if it's just in combination with MSI or not.
> 
> This:
> - does not seem to be a regression and
> - missing MSI support is not such a big problem.
> 
> Considering how many problems patches in this area tend to cause on 
> different hardware, I'm even inclined to say that such patches should 
> only be added during the 2 weeks merge window before -rc1.

(I was only talking about the first issue/patch as being a regression, 
obviously, and forgot that there was more to the email I cited.)

Ah, okay. I somehow missed that all of the devices that were reported 
to break with the MSI change in mainline doesn't support MSI in mainline. 
Actually, I wouldn't be surprised if this issue applied to audio on ATI 
SB450 and later, which (I think) use the hda_intel driver, which supports 
MSI (although I guess it's still defaulting to disabled). If this is true, 
it would be a regression since 2.6.19.

The addition of a quirk to not use pci_intx with MSI on ATI PCI devices 
should be safe (until 2.6.20-rc1, this was the usual kernel behavior), but 
is clearly not critical if mainline doesn't use MSI with any such devices 
anyway.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PROBLEM: setup apm as module version 2.4.34

2006-12-29 Thread Dr.-Ing. Ingo D. Rullhusen

Hello,

i hope that's the right address for this little problem, which arises 
with linux kernel 2.4.34.


If i compile the Advanced Power Management as module it do not work. If 
i try a depmod i get an unresolved symbols message and so it cannot be 
loaded of course.


But if the APM part is compiled into the kernel directly it works.

Simply disable the compile as module option?

Thanks
  Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] remove MAX_ARG_PAGES

2006-12-29 Thread Ingo Molnar

FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to 
2.6.20-rc2 and have included it in the -rt kernel. It's working great - 
i can now finally do a "ls -t patches/*.patch" in my patch repository - 
something i havent been able to do for years ;-)

what is keeping this fix from going upstream?

Ingo

-->
Subject: [patch] remove MAX_ARG_PAGES
From: Ollie Wild <[EMAIL PROTECTED]>

this patch removes the MAX_ARG_PAGES limit by copying between VMs. This 
makes process argv/env limited by the stack limit (and it's thus 
arbitrarily sizable). No more:

  -bash: /bin/ls: Argument list too long

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 arch/x86_64/ia32/ia32_binfmt.c |   55 -
 fs/binfmt_elf.c|   12 -
 fs/binfmt_misc.c   |4 
 fs/binfmt_script.c |4 
 fs/compat.c|  118 
 fs/exec.c  |  382 +++--
 include/linux/binfmts.h|   14 -
 include/linux/mm.h |7 
 kernel/auditsc.c   |5 
 mm/mprotect.c  |2 
 mm/mremap.c|2 
 11 files changed, 250 insertions(+), 355 deletions(-)

Index: linux/arch/x86_64/ia32/ia32_binfmt.c
===
--- linux.orig/arch/x86_64/ia32/ia32_binfmt.c
+++ linux/arch/x86_64/ia32/ia32_binfmt.c
@@ -279,9 +279,6 @@ do {
\
 #define load_elf_binary load_elf32_binary
 
 #define ELF_PLAT_INIT(r, load_addr)elf32_init(r)
-#define setup_arg_pages(bprm, stack_top, exec_stack) \
-   ia32_setup_arg_pages(bprm, stack_top, exec_stack)
-int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, 
int executable_stack);
 
 #undef start_thread
 #define start_thread(regs,new_rip,new_rsp) do { \
@@ -336,57 +333,7 @@ static void elf32_init(struct pt_regs *r
 int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top,
 int executable_stack)
 {
-   unsigned long stack_base;
-   struct vm_area_struct *mpnt;
-   struct mm_struct *mm = current->mm;
-   int i, ret;
-
-   stack_base = stack_top - MAX_ARG_PAGES * PAGE_SIZE;
-   mm->arg_start = bprm->p + stack_base;
-
-   bprm->p += stack_base;
-   if (bprm->loader)
-   bprm->loader += stack_base;
-   bprm->exec += stack_base;
-
-   mpnt = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
-   if (!mpnt) 
-   return -ENOMEM; 
-
-   memset(mpnt, 0, sizeof(*mpnt));
-
-   down_write(&mm->mmap_sem);
-   {
-   mpnt->vm_mm = mm;
-   mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p;
-   mpnt->vm_end = stack_top;
-   if (executable_stack == EXSTACK_ENABLE_X)
-   mpnt->vm_flags = VM_STACK_FLAGS |  VM_EXEC;
-   else if (executable_stack == EXSTACK_DISABLE_X)
-   mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
-   else
-   mpnt->vm_flags = VM_STACK_FLAGS;
-   mpnt->vm_page_prot = (mpnt->vm_flags & VM_EXEC) ? 
-   PAGE_COPY_EXEC : PAGE_COPY;
-   if ((ret = insert_vm_struct(mm, mpnt))) {
-   up_write(&mm->mmap_sem);
-   kmem_cache_free(vm_area_cachep, mpnt);
-   return ret;
-   }
-   mm->stack_vm = mm->total_vm = vma_pages(mpnt);
-   } 
-
-   for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-   struct page *page = bprm->page[i];
-   if (page) {
-   bprm->page[i] = NULL;
-   install_arg_page(mpnt, page, stack_base);
-   }
-   stack_base += PAGE_SIZE;
-   }
-   up_write(&mm->mmap_sem);
-   
-   return 0;
+   return setup_arg_pages(bprm, stack_top, executable_stack);
 }
 EXPORT_SYMBOL(ia32_setup_arg_pages);
 
Index: linux/fs/binfmt_elf.c
===
--- linux.orig/fs/binfmt_elf.c
+++ linux/fs/binfmt_elf.c
@@ -253,8 +253,8 @@ create_elf_tables(struct linux_binprm *b
size_t len;
if (__put_user((elf_addr_t)p, argv++))
return -EFAULT;
-   len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES);
-   if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
+   len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
+   if (!len || len > MAX_ARG_STRLEN)
return 0;
p += len;
}
@@ -265,8 +265,8 @@ create_elf_tables(struct linux_binprm *b
size_t len;
if (__put_user((elf_addr_t)p, envp++))
return -EFAULT;
-   len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES);
-

Re: 2.6.20-rc2: known unfixed regressions

2006-12-29 Thread Adrian Bunk
On Fri, Dec 29, 2006 at 01:07:10PM -0500, Ben Collins wrote:
> On Thu, 2006-12-28 at 23:39 +0100, Adrian Bunk wrote:
> > This email lists some known regressions in 2.6.20-rc2 compared to 2.6.19.
> > 
> > If you find your name in the Cc header, you are either submitter of one
> > of the bugs, maintainer of an affectected subsystem or driver, a patch
> > of you caused a breakage or I'm considering you in any other way possibly
> > involved with one or more of these issues.
> > 
> > Due to the huge amount of recipients, please trim the Cc when answering.
> 
> > Subject: i386: Oops in __find_get_block()
> > References : http://lkml.org/lkml/2006/12/16/138
> > Submitter  : Ben Collins <[EMAIL PROTECTED]>
> >  Daniel Holbach <[EMAIL PROTECTED]>
> > Status : unknown
> 
> I believe this is the same bug as I've seen reported about gdb. I'd have
> to find the thread/information regarding it. Not sure if it was fixed
> already.

Subject: BUG at fs/buffer.c:1235 when using gdb
References : http://lkml.org/lkml/2006/12/17/134
Submitter  : Andrew J. Barr <[EMAIL PROTECTED]>
Fixed-By   : Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Commit : 8701ea957dd2a7c309e17c8dcde3a64b92d8aec0
Status : fixed in -rc2

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc2: known unfixed regressions

2006-12-29 Thread Adrian Bunk
On Fri, Dec 29, 2006 at 01:14:13PM -0500, Daniel Barkalow wrote:

> There's also http://lkml.org/lkml/2006/12/21/47; the included patch break 
> my nVidia devices and probably all PCIX devices, so it's not right, but 
> something has to be done to fix ATI. My guess is a quirk to say that 
> pci_intx doesn't work on certain devices and should just be skipped, but 
> I'm not sure if it's just in combination with MSI or not.

This:
- does not seem to be a regression and
- missing MSI support is not such a big problem.

Considering how many problems patches in this area tend to cause on 
different hardware, I'm even inclined to say that such patches should 
only be added during the 2 weeks merge window before -rc1.

>   -Daniel

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-29 Thread Dave Jones
On Fri, Dec 29, 2006 at 07:52:15PM +0100, maximilian attems wrote:
 
 > > The only -mm stuff I recall being in the Fedora 2.6.18 is
 > > the inode-diet stuff which ended up in 2.6.19, though the xmas
 > > break has left my head somewhat empty so I may be forgetting something.
 > > What patch in particular are you talking about?
 > 
 > it's no longer visible in the FC6 cvs, due to rebase
 >  but it's name was linux-2.6-mm-tracking-dirty-pages.patch
 > it is an earlier almagame of the merged patch serie:
 >- mm: tracking shared dirty pages
 >- mm: balance dirty pages
 >- mm: optimize the new mprotect() code a bit
 >- mm: small cleanup of install_page()
 >- mm: fixup do_wp_page()
 >- mm: msync() cleanup (closes: #394392)

Ohh, that. Yes. I had forgotten all about that.
I've been hitting the nog a little too hard :)

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] change libfs sb creation routines to avoid collisions with their root inodes

2006-12-29 Thread Jeff Layton
This changes the superblock creation routines that call new_inode to take steps
to avoid later collisions with other inodes that get created. I took the
approach here of not hashing things unless is was strictly necessary, though
that does mean that filesystem authors need to be careful to avoid collisions
by calling iunique properly.

Signed-off-by: Jeff Layton <[EMAIL PROTECTED]>

diff --git a/fs/libfs.c b/fs/libfs.c
index 503898d..5bdaf00 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -217,6 +217,12 @@ int get_sb_pseudo(struct file_system_type *fs_type, char 
*name,
root = new_inode(s);
if (!root)
goto Enomem;
+   /*
+* since this is the first inode, make it number 1. New inodes created
+ * after this must take care not to collide with it (by passing
+* max_reserved of 1 to iunique).
+*/
+   root->i_ino = 1;
root->i_mode = S_IFDIR | S_IRUSR | S_IWUSR;
root->i_uid = root->i_gid = 0;
root->i_atime = root->i_mtime = root->i_ctime = CURRENT_TIME;
@@ -373,6 +379,9 @@ int simple_fill_super(struct super_block *s, int magic, 
struct tree_descr *files
inode = new_inode(s);
if (!inode)
return -ENOMEM;
+   /* set to high value to try and avoid collisions with loop below */
+   inode->i_ino = 0x;
+   insert_inode_hash(inode);
inode->i_mode = S_IFDIR | 0755;
inode->i_uid = inode->i_gid = 0;
inode->i_blocks = 0;
@@ -399,6 +408,11 @@ int simple_fill_super(struct super_block *s, int magic, 
struct tree_descr *files
inode->i_blocks = 0;
inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
inode->i_fop = files->ops;
+   /*
+* no need to hash these, but you need to make sure that any
+* calls to iunique on this mount call it with a max_reserved
+* value high enough to avoid collisions with these inodes.
+*/
inode->i_ino = i;
d_add(dentry, inode);
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] have pipefs ensure i_ino uniqueness by calling iunique and hashing the inode

2006-12-29 Thread Jeff Layton
This converts pipefs to use the new scheme. Here we're calling iunique to get
a unique i_ino value for the new inode, and then hashing it afterward. We
call iunique with a max_reserved value of 1 to avoid collision with the root
inode.  Since the inode is now hashed, we need to take care that we end up in
generic_delete_inode rather than generic_forget_inode or we'll create a nasty
leak, so we clear_nlink when we destroy the pipe info.

I'm not certain that this is the right place to add the clear_nlink, though
it does seem to work. I'm open to suggestions on a better place to put
this, or of a better way to make sure that we end up with i_nlink == 0 at
iput time.

Signed-off-by: Jeff Layton <[EMAIL PROTECTED]>

diff --git a/fs/pipe.c b/fs/pipe.c
index 68090e8..1d44ff0 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -825,6 +825,7 @@ void free_pipe_info(struct inode *inode)
 {
__free_pipe_info(inode->i_pipe);
inode->i_pipe = NULL;
+   clear_nlink(inode);
 }
 
 static struct vfsmount *pipe_mnt __read_mostly;
@@ -871,6 +872,8 @@ static struct inode * get_pipe_inode(void)
inode->i_uid = current->fsuid;
inode->i_gid = current->fsgid;
inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+   inode->i_ino = iunique(pipe_mnt->mnt_sb, 1);
+   insert_inode_hash(inode);
 
return inode;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >