Re: [PATCH] chaostables

2007-03-08 Thread Jan Engelhardt
Hello,

On Thu, 08 Mar 2007 18:15:12 +0100, Patrick McHardy wrote:
Index: linux-2.6.21-rc3/net/netfilter/xt_CHAOS.c
+   /* Equivalent to:
+* -A chaos -m statistic --mode random --probability \
+* $reject_percentage -j REJECT --reject-with host-unreach;
+* -A chaos -m statistic --mode random --probability \
+* $delude_percentage -j DELUDE;
[2nd one should have had -p tcp]
+* -A chaos -j DROP;
+*/
>>>
>>>What does this do that can't be done by simply adding those individual 
>>>rules?
>>
>>It "wraps it all up", reducing the overall number of rules and user 
>>chains required in the filtering tables to implement the wanted logic. 
>>Reducing the number of filtering rules also reduces the time process a 
>>packet. These two are, in my opinion, a good thing.
>
>By that argument we could just codify every ruleset and put it in the
>kernel. Its three simple rules. There is no chance I'm going to take
>this part.

While that is indeed true, I think users will have a judgement (perhaps 
call it "first impression") that puts a certain set of NF rules into 
either of the two categories "this is fundamental/generic enough to 
warrant its own module" and "this does not". While

  -A INPUT -s 134.76.0.0/16 -p tcp --dport 22 -j ACCEPT;
  -A INPUT -p tcp --dport 80 -j ACCEPT;
  -A INPUT -j REJECT;

is clearly something that only applies to one machine only, perhaps a
little subnet, or at best, the servers on the company network, it is not
"for everyone". xt_CHAOS on the other hand was meant - if you want so - as
a replacement for DROP/REJECT and the default policy, e.g.:

# Block all evil, even if from inside the house.
-A INPUT -m evil -j CHAOS;
# Ignore stray packets not directed at us
-A INPUT -d mybase -m this -j ACCEPT;
# Management console
-A INPUT -s yourbase -m that -j ACCEPT;
# Chain policy (instead, or supplemental to, -P INPUT DROP)
-A INPUT -j CHAOS;

(-m evil, -m this and -m that are placeholders and are not seriously
considered to get their own kernel module anytime.)

For me, this falls under generic-enough, but your (and other people's) mileage
migt vary.


Thank you for the comments,

Jan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] swsusp: Disable nonboot CPUs before entering platform suspend

2007-03-08 Thread Rafael J. Wysocki
On Wednesday, 7 March 2007 22:07, Pavel Machek wrote:
> Hi!
> 
> > Prevent the WARN_ON() in arch/x86_64/kernel/acpi/sleep.c:init_low_mapping()
> > from triggering by disabling nonboot CPUs before we finally enter the 
> > platform
> > suspend.
> > 
> > Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > ---
> >  kernel/power/disk.c |1 +
> >  kernel/power/user.c |2 +-
> >  2 files changed, 2 insertions(+), 1 deletion(-)
> > 
> > Index: linux-2.6.21-rc2-mm2/kernel/power/disk.c
> > ===
> > --- linux-2.6.21-rc2-mm2.orig/kernel/power/disk.c
> > +++ linux-2.6.21-rc2-mm2/kernel/power/disk.c
> > @@ -61,6 +61,7 @@ static void power_down(suspend_disk_meth
> > switch(mode) {
> > case PM_DISK_PLATFORM:
> > if (pm_ops && pm_ops->enter) {
> > +   disable_nonboot_cpus();
> > kernel_shutdown_prepare(SYSTEM_SUSPEND_DISK);
> > pm_ops->enter(PM_SUSPEND_DISK);
> > break;
> 
> ...so, if pm_ops is non-null, power_down does nonboot cpu disabling,
> otherwise we proceed with cpus enabled?
> 
> That looks ugly.
> 
> Is the warning bogus?

Well, maybe.  I'm not sure.

> Or maybe we should *always* disable nonboot cpus in powerdown path?

I think we should do that.

> > Index: linux-2.6.21-rc2-mm2/kernel/power/user.c
> > ===
> > --- linux-2.6.21-rc2-mm2.orig/kernel/power/user.c
> > +++ linux-2.6.21-rc2-mm2/kernel/power/user.c
> > @@ -398,9 +398,9 @@ static int snapshot_ioctl(struct inode *
> >  
> > case PMOPS_ENTER:
> > if (data->platform_suspend) {
> > +   disable_nonboot_cpus();
> > kernel_shutdown_prepare(SYSTEM_SUSPEND_DISK);
> > error = pm_ops->enter(PM_SUSPEND_DISK);
> > -   error = 0;
> > }
> > break;
> 
> Foe an userland application, disabling cpus during pmops_enter is at
> least surprising...

Yes, but this is not a usual ioctl().  OTOH, we can call enable_nonboot_cpus()
if pm_ops->enter(PM_SUSPEND_DISK) returns an error (otherwise it souldn't
return at all, no?).

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Use more gcc extensions in the Linux headers

2007-03-08 Thread Christoph Hellwig
On Fri, Mar 09, 2007 at 09:50:56AM +0300, Andrey Panin wrote:
> On 068, 03 09, 2007 at 04:56:32PM +1100, Rusty Russell wrote:
> > __builtin_types_compatible_p() has been around since gcc 2.95,
> 
> but it's not available in Intel C compiler IIRC :(

So what?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Use more gcc extensions in the Linux headers

2007-03-08 Thread Christoph Hellwig
On Fri, Mar 09, 2007 at 04:56:32PM +1100, Rusty Russell wrote:
> __builtin_types_compatible_p() has been around since gcc 2.95, and we
> don't use it anywhere.  This patch quietly fixes that.
> 
> Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
> 
> diff -r f0ff8138f993 include/linux/kernel.h
> --- a/include/linux/kernel.h  Fri Mar 09 16:40:25 2007 +1100
> +++ b/include/linux/kernel.h  Fri Mar 09 16:44:04 2007 +1100
> @@ -35,7 +35,9 @@ extern const char linux_proc_banner[];
>  #define ALIGN(x,a)   __ALIGN_MASK(x,(typeof(x))(a)-1)
>  #define __ALIGN_MASK(x,mask) (((x)+(mask))&~(mask))
>  
> -#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
> +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])  
>   \
> + + sizeof(typeof(int[1 - 2*!!__builtin_types_compatible_p(typeof(arr), \
> +  typeof([0]))]))*0)

This needs a comment explaning why we're doing this, and maybe a little
explanation of the combination of gcc magic and C trickery used to implement
it to the brave non-uberhacker people trying to understand linux headers.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/9] lguest: block device speedup

2007-03-08 Thread Christoph Hellwig
On Fri, Mar 09, 2007 at 02:05:24PM +1100, Rusty Russell wrote:
> diff -r fdc8cbc1fd61 drivers/block/lguest_blk.c
> --- a/drivers/block/lguest_blk.c  Thu Mar 08 13:35:39 2007 +1100
> +++ b/drivers/block/lguest_blk.c  Thu Mar 08 15:51:55 2007 +1100
> @@ -45,6 +45,16 @@ struct blockdev
>   struct request *req;
>  };
>  
> +/* Jens gave me this nice helper to end all chunks of a request. */
> +static void end_entire_request(struct request *req, int uptodate)
> +{
> + if (end_that_request_first(req, uptodate, req->hard_nr_sectors))
> + BUG();
> + add_disk_randomness(req->rq_disk);
> + blkdev_dequeue_request(req);
> + end_that_request_last(req, uptodate);
> +}

I think we really want this in common code, ll_rw_blk.c should have:

static int __end_request(struct request *req, int uptodate,
unsigned int sectors)
{
if (!end_that_request_first(req, uptodate, sectors)) {
add_disk_randomness(req->rq_disk);
blkdev_dequeue_request(req);
end_that_request_last(req, uptodate);
return 1;
}
return 0;
}

/* TODO: add kerneldoc comment */
/* XXX: should be called end_partial_request */
void end_request(struct request *req, int uptodate)
{
__end_request(req, uptodate, req->hard_cur_sectors);
}

/* TODO: add kerneldoc comment */
void end_entired_request(struct request *req, int uptodate)
{
if (!__end_request(req, uptodate, req->hard_nr_sectors))
BUG();
}

the latter two maybe as inlines
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] swsusp: Disable nonboot CPUs before entering platform suspend

2007-03-08 Thread Pavel Machek
Hi!

> Prevent the WARN_ON() in arch/x86_64/kernel/acpi/sleep.c:init_low_mapping()
> from triggering by disabling nonboot CPUs before we finally enter the platform
> suspend.
> 
> Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> ---
>  kernel/power/disk.c |1 +
>  kernel/power/user.c |2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6.21-rc2-mm2/kernel/power/disk.c
> ===
> --- linux-2.6.21-rc2-mm2.orig/kernel/power/disk.c
> +++ linux-2.6.21-rc2-mm2/kernel/power/disk.c
> @@ -61,6 +61,7 @@ static void power_down(suspend_disk_meth
>   switch(mode) {
>   case PM_DISK_PLATFORM:
>   if (pm_ops && pm_ops->enter) {
> + disable_nonboot_cpus();
>   kernel_shutdown_prepare(SYSTEM_SUSPEND_DISK);
>   pm_ops->enter(PM_SUSPEND_DISK);
>   break;

...so, if pm_ops is non-null, power_down does nonboot cpu disabling,
otherwise we proceed with cpus enabled?

That looks ugly.

Is the warning bogus? Or maybe we should *always* disable nonboot cpus
in powerdown path?

> Index: linux-2.6.21-rc2-mm2/kernel/power/user.c
> ===
> --- linux-2.6.21-rc2-mm2.orig/kernel/power/user.c
> +++ linux-2.6.21-rc2-mm2/kernel/power/user.c
> @@ -398,9 +398,9 @@ static int snapshot_ioctl(struct inode *
>  
>   case PMOPS_ENTER:
>   if (data->platform_suspend) {
> + disable_nonboot_cpus();
>   kernel_shutdown_prepare(SYSTEM_SUSPEND_DISK);
>   error = pm_ops->enter(PM_SUSPEND_DISK);
> - error = 0;
>   }
>   break;

Foe an userland application, disabling cpus during pmops_enter is at
least surprising...

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] utrace: nommu fixup support utrace

2007-03-08 Thread Roland McGrath
I understand the NOMMU situation, and you are already screwed by
PTRACE_ATTACH.  What I meant to suggest is that I would start from a
safety point of view with get_user_pages/access_process_vm refusing to
do force& to MAP_PRIVATE pages that are in fact being shared
(ETXTBSY or something).  (When it's not being shared, it should do
whatever is necessary to make sure that page is known dirty and not
hand it out for later mappings.)  Then you can go about trying to make
the safe (no sharing) case come about when you want it.  You still
won't win with PTRACE_ATTACH and the like unless you happen not to
have sharing in the places you insert your breakpoints at the time.
But at least the debugger will just lose, instead of breaking
unsuspecting processes.  With the utrace patches, you can approximate
the ptrace check you had with something like:

if (tracehook_consider_fatal_signal(current, SIGTRAP))

or whatever signal you think poking text might result in that the
debugger will be looking for (atm it doesn't actually matter what
signo you pass).  This returns true when ptrace is in use, and
probably also for later utrace-based ways a debugger attaches if it is
expecting ahead of time to be debugging heavily as with breakpoints.
(And that's about the best you can do for a single address space system.)


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] Could the k8temp driver be interfering with ACPI?

2007-03-08 Thread Pavel Machek
Hi!

> Port (and memory) addresses can be dynamically generated by the AML code
> and thus, there is no way that the ACPI subsystem can statically predict
> any addresses that will be accessed by the AML.

Can you take this as a wishlist item?

It would be nice if next version of acpi specs supported table

'AML / SMM BIOS will access these ports'

...so we can get it correct with acpi4 or something..?

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] drivers/media/video/se401.c: check kmalloc() return value.

2007-03-08 Thread Amit Choudhary
Description: Check the return value of kmalloc() in function 
se401_start_stream(), in file drivers/media/video/se401.c.

Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>

diff --git a/drivers/media/video/se401.c b/drivers/media/video/se401.c
index 7aeec57..006c818 100644
--- a/drivers/media/video/se401.c
+++ b/drivers/media/video/se401.c
@@ -450,6 +450,13 @@ static int se401_start_stream(struct usb
}
for (i=0; isbuf[i].data=kmalloc(SE401_PACKETSIZE, GFP_KERNEL);
+   if (!se401->sbuf[i].data) {
+   for(i = i - 1; i >= 0; i--) {
+   kfree(se401->sbuf[i].data);
+   se401->sbuf[i].data = NULL;
+   }
+   return -ENOMEM;
+   }
}
 
se401->bayeroffset=0;
@@ -458,13 +465,26 @@ static int se401_start_stream(struct usb
se401->scratch_overflow=0;
for (i=0; iscratch[i].data=kmalloc(SE401_PACKETSIZE, GFP_KERNEL);
+   if (!se401->scratch[i].data) {
+   for(i = i - 1; i >= 0; i--) {
+   kfree(se401->scratch[i].data);
+   se401->scratch[i].data = NULL;
+   }
+   goto nomem_sbuf;
+   }
se401->scratch[i].state=BUFFER_UNUSED;
}
 
for (i=0; i= 0; i--) {
+   usb_kill_urb(se401->urb[i]);
+   usb_free_urb(se401->urb[i]);
+   se401->urb[i] = NULL;
+   }
+   goto nomem_scratch;
+   }
 
usb_fill_bulk_urb(urb, se401->dev,
usb_rcvbulkpipe(se401->dev, SE401_VIDEO_ENDPOINT),
@@ -482,6 +502,18 @@ static int se401_start_stream(struct usb
se401->framecount=0;
 
return 0;
+
+ nomem_scratch:
+   for (i=0; iscratch[i].data);
+   se401->scratch[i].data = NULL;
+   }
+ nomem_sbuf:
+   for (i=0; isbuf[i].data);
+   se401->sbuf[i].data = NULL;
+   }
+   return -ENOMEM;
 }
 
 static int se401_stop_stream(struct usb_se401 *se401)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/wanrouter/wanmain.c: check kmalloc() return value.

2007-03-08 Thread David Miller
From: Amit Choudhary <[EMAIL PROTECTED]>
Date: Thu, 8 Mar 2007 23:26:54 -0800

> Description: Check the return value of kmalloc() in function dbg_kmalloc(), 
> in file net/wanrouter/wanmain.c.
> 
> Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>

There is no reason for any subsystem to implement it's
own debugging allocator when we have one that works
perfectly fine already in SLAB.

So to fix this we should simply remove all of the
allocation debugging code here.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix avr32 TIF atomicity in do_debug_priv

2007-03-08 Thread Haavard Skinnemoen
On Thu, 8 Mar 2007 22:21:37 -0500
Mathieu Desnoyers <[EMAIL PROTECTED]> wrote:

> Fix avr32 TIF atomicity in do_debug_priv
> 
> avr32 updates the thread flags 1 - non atomically and 2 - with the wrong value
> (for TIF_SINGLE_STEP) in this function.
> 
> It applies to 2.6.20.

Thanks, but this has already been fixed by a19b4a14053f for 2.6.21.

Haavard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] drivers/char/vt.c: check kmalloc() return value.

2007-03-08 Thread Amit Choudhary
Description: Check the return value of kmalloc() in function con_init(), in 
file drivers/char/vt.c.

Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>

diff --git a/drivers/char/vt.c b/drivers/char/vt.c
index 87587b4..6aa08cb 100644
--- a/drivers/char/vt.c
+++ b/drivers/char/vt.c
@@ -2640,6 +2640,15 @@ static int __init con_init(void)
 */
for (currcons = 0; currcons < MIN_NR_CONSOLES; currcons++) {
vc_cons[currcons].d = vc = alloc_bootmem(sizeof(struct 
vc_data));
+   if (!vc_cons[currcons].d) {
+   for (--currcons; currcons >= 0; currcons--) {
+   kfree(vc_cons[currcons].d);
+   vc_cons[currcons].d = NULL;
+   }
+   release_console_sem();
+   return -ENOMEM;
+   }
+
visual_init(vc, currcons, 1);
vc->vc_screenbuf = (unsigned short 
*)alloc_bootmem(vc->vc_screenbuf_size);
vc->vc_kmalloced = 0;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] fs/jffs2/scan.c: Fix error-path leak

2007-03-08 Thread Amit Choudhary
Description: Fix error-path leak in function jffs2_scan_medium(), in file 
fs/jffs2/scan.c

Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>

diff --git a/fs/jffs2/scan.c b/fs/jffs2/scan.c
index e241346..cd9ed6e 100644
--- a/fs/jffs2/scan.c
+++ b/fs/jffs2/scan.c
@@ -130,6 +130,8 @@ #endif
if (jffs2_sum_active()) {
s = kmalloc(sizeof(struct jffs2_summary), GFP_KERNEL);
if (!s) {
+   free(flashbuf);
+   flashbuf = NULL;
JFFS2_WARNING("Can't allocate memory for summary\n");
return -ENOMEM;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] drivers/char/agp/sgi-agp.c: check kmalloc() return value.

2007-03-08 Thread Amit Choudhary
Description: Check the return value of kmalloc() in function agp_sgi_init(), in 
file drivers/char/agp/sgi-agp.c.

Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>

diff --git a/drivers/char/agp/sgi-agp.c b/drivers/char/agp/sgi-agp.c
index d73be4c..5897e6c 100644
--- a/drivers/char/agp/sgi-agp.c
+++ b/drivers/char/agp/sgi-agp.c
@@ -285,6 +285,8 @@ static int __devinit agp_sgi_init(void)
(struct agp_bridge_data **)kmalloc(tioca_gart_found *
   sizeof(struct agp_bridge_data *),
   GFP_KERNEL);
+   if (!sgi_tioca_agp_bridges)
+   return -ENOMEM;
 
j = 0;
list_for_each_entry(info, _list, ca_list) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [REVISED] net/ipv4/multipath_wrandom.c: check kmalloc() return value.

2007-03-08 Thread David Miller
From: Amit Choudhary <[EMAIL PROTECTED]>
Date: Thu, 8 Mar 2007 23:22:15 -0800

> Description: Check the return value of kmalloc() in function 
> wrandom_set_nhinfo(), in file net/ipv4/multipath_wrandom.c.
> 
> Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>

This kind of patch has been submitted several times before and it's
never accepted because you have to do much more than this to recover
from the allocation error.

There is no error status returned to the caller, so the callers assume
the operation succeeded, and will either OOPS or crash in some other
way.

Therefore, just adding some NULL pointer checks and returning is not
going to fix this bug.

The whole cahce-multipath subsystem has to have it's guts revamped for
proper error handling.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] net/wanrouter/wanmain.c: check kmalloc() return value.

2007-03-08 Thread Amit Choudhary
Description: Check the return value of kmalloc() in function dbg_kmalloc(), in 
file net/wanrouter/wanmain.c.

Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>

diff --git a/net/wanrouter/wanmain.c b/net/wanrouter/wanmain.c
index 316211d..263450c 100644
--- a/net/wanrouter/wanmain.c
+++ b/net/wanrouter/wanmain.c
@@ -67,6 +67,8 @@ static void * dbg_kmalloc(unsigned int s
int i = 0;
void * v = kmalloc(size+sizeof(unsigned int)+2*KMEM_SAFETYZONE*8,prio);
char * c1 = v;
+   if (!v)
+   return NULL;
c1 += sizeof(unsigned int);
*((unsigned int *)v) = size;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] NET: Multiple queue network device support REPOST

2007-03-08 Thread David Miller
From: "Waskiewicz Jr, Peter P" <[EMAIL PROTECTED]>
Date: Thu, 8 Mar 2007 23:16:58 -0800

> It seems expensive to change all the skb's if this type of
> event occurs,

The reset functions have to walk all the SKBs anyways.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] drivers/atm/fore200e.c: change in error message.

2007-03-08 Thread Amit Choudhary
Description: Change in error message in function fore200e_kmalloc(), in file 
drivers/atm/fore200e.c.

Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>

diff --git a/drivers/atm/fore200e.c b/drivers/atm/fore200e.c
index 3a7b21f..1c7ea02 100644
--- a/drivers/atm/fore200e.c
+++ b/drivers/atm/fore200e.c
@@ -178,7 +178,7 @@ fore200e_kmalloc(int size, gfp_t flags)
 void *chunk = kzalloc(size, flags);
 
 if (!chunk)
-   printk(FORE200E "kmalloc() failed, requested size = %d, flags = 
0x%x\n",size, flags);
+   printk(FORE200E "kzalloc() failed, requested size = %d, flags = 
0x%x\n",size, flags);
 
 return chunk;
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] drivers/char/synclink.c: check kmalloc() return value.

2007-03-08 Thread Amit Choudhary
Description: Check the return value of kmalloc() in function 
mgsl_alloc_intermediate_txbuffer_memory(), in file drivers/char/synclink.c.

Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>

diff --git a/drivers/char/synclink.c b/drivers/char/synclink.c
index 06784ad..24f99bc 100644
--- a/drivers/char/synclink.c
+++ b/drivers/char/synclink.c
@@ -4012,8 +4012,13 @@ static int mgsl_alloc_intermediate_txbuf
for ( i=0; inum_tx_holding_buffers; ++i) {
info->tx_holding_buffers[i].buffer =
kmalloc(info->max_frame_size, GFP_KERNEL);
-   if ( info->tx_holding_buffers[i].buffer == NULL )
+   if (info->tx_holding_buffers[i].buffer == NULL) {
+   for (--i; i >= 0; i--) {
+   kfree(info->tx_holding_buffers[i].buffer);
+   info->tx_holding_buffers[i].buffer = NULL;
+   }
return -ENOMEM;
+   }
}
 
return 0;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sound/oss/i810_audio.c: check kmalloc() return value.

2007-03-08 Thread Amit Choudhary
Description: Check the return value of kmalloc() in function i810_open(), in 
file sound/oss/i810_audio.c.

Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>

diff --git a/sound/oss/i810_audio.c b/sound/oss/i810_audio.c
index 240cc79..a415967 100644
--- a/sound/oss/i810_audio.c
+++ b/sound/oss/i810_audio.c
@@ -2580,8 +2580,13 @@ static int i810_open(struct inode *inode
if (card->states[i] == NULL) {
state = card->states[i] = (struct i810_state *)
kmalloc(sizeof(struct i810_state), 
GFP_KERNEL);
-   if (state == NULL)
+   if (state == NULL) {
+   for (--i; i >= 0; i--) {
+   kfree(card->states[i]);
+   card->states[i] = NULL;
+   }
return -ENOMEM;
+   }
memset(state, 0, sizeof(struct i810_state));
dmabuf = >dmabuf;
goto found_virt;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] NET: Multiple queue network device support

2007-03-08 Thread Jarek Poplawski
On 07-03-2007 23:42, David Miller wrote:
> I didn't say to use skb->priority, I said to shrink skb->priority down
> to a u16 and then make another u16 which will store your queue mapping
> value.

Peter is right: this is fully used by schedulers (prio,
CBQ, HTB, HFSC...) and would break users' scripts, so I
wouldn't recommend, too.

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [REVISED] net/ipv4/multipath_wrandom.c: check kmalloc() return value.

2007-03-08 Thread Amit Choudhary
Description: Check the return value of kmalloc() in function 
wrandom_set_nhinfo(), in file net/ipv4/multipath_wrandom.c.

Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>

diff --git a/net/ipv4/multipath_wrandom.c b/net/ipv4/multipath_wrandom.c
index 92b0482..bcdb1f1 100644
--- a/net/ipv4/multipath_wrandom.c
+++ b/net/ipv4/multipath_wrandom.c
@@ -242,6 +242,9 @@ static void wrandom_set_nhinfo(__be32 ne
target_route = (struct multipath_route *)
kmalloc(size_rt, GFP_ATOMIC);
 
+   if (!target_route)
+   goto error;
+
target_route->gw = nh->nh_gw;
target_route->oif = nh->nh_oif;
memset(_route->rcu, 0, sizeof(struct rcu_head));
@@ -263,6 +266,9 @@ static void wrandom_set_nhinfo(__be32 ne
target_dest = (struct multipath_dest*)
kmalloc(size_dst, GFP_ATOMIC);
 
+   if (!target_dest)
+   goto error;
+
target_dest->nh_info = nh;
target_dest->network = network;
target_dest->netmask = netmask;
@@ -275,6 +281,7 @@ static void wrandom_set_nhinfo(__be32 ne
 * we are finished
 */
 
+ error:
spin_unlock_bh([state_idx].lock);
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [REVISED] drivers/media/video/stv680.c: check kmalloc() return value.

2007-03-08 Thread Amit Choudhary
Description: Check the return value of kmalloc() in function 
stv680_start_stream(), in file drivers/media/video/stv680.c.

Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>

diff --git a/drivers/media/video/stv680.c b/drivers/media/video/stv680.c
index 6d1ef1e..f35c664 100644
--- a/drivers/media/video/stv680.c
+++ b/drivers/media/video/stv680.c
@@ -687,7 +687,11 @@ static int stv680_start_stream (struct u
stv680->sbuf[i].data = kmalloc (stv680->rawbufsize, GFP_KERNEL);
if (stv680->sbuf[i].data == NULL) {
PDEBUG (0, "STV(e): Could not kmalloc raw data buffer 
%i", i);
-   return -1;
+   for (i = i - 1; i >= 0; i--) {
+   kfree(stv680->sbuf[i].data);
+   stv680->sbuf[i].data = NULL;
+   }
+   return -ENOMEM;
}
}
 
@@ -698,15 +702,25 @@ static int stv680_start_stream (struct u
stv680->scratch[i].data = kmalloc (stv680->rawbufsize, 
GFP_KERNEL);
if (stv680->scratch[i].data == NULL) {
PDEBUG (0, "STV(e): Could not kmalloc raw scratch 
buffer %i", i);
-   return -1;
+   for (i = i - 1; i >= 0; i--) {
+   kfree(stv680->scratch[i].data);
+   stv680->scratch[i].data = NULL;
+   }
+   goto nomem_sbuf;
}
stv680->scratch[i].state = BUFFER_UNUSED;
}
 
for (i = 0; i < STV680_NUMSBUF; i++) {
urb = usb_alloc_urb (0, GFP_KERNEL);
-   if (!urb)
-   return -ENOMEM;
+   if (!urb) {
+   for (i = i - 1; i >= 0; i--) {
+   usb_kill_urb(stv680->urb[i]);
+   usb_free_urb(stv680->urb[i]);
+   stv680->urb[i] = NULL;
+   }
+   goto nomem_scratch;
+   }
 
/* sbuf is urb->transfer_buffer, later gets memcpyed to scratch 
*/
usb_fill_bulk_urb (urb, stv680->udev,
@@ -721,6 +735,18 @@ static int stv680_start_stream (struct u
 
stv680->framecount = 0;
return 0;
+
+ nomem_scratch:
+   for (i=0; iscratch[i].data);
+   stv680->scratch[i].data = NULL;
+   }
+ nomem_sbuf:
+   for (i=0; isbuf[i].data);
+   stv680->sbuf[i].data = NULL;
+   }
+   return -ENOMEM;
 }
 
 static int stv680_stop_stream (struct usb_stv *stv680)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm/slab.c: check kmalloc() return value.

2007-03-08 Thread Amit Choudhary
Description: Check the return value of kmalloc() in function setup_cpu_cache(), 
in file mm/slab.c.

Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>

diff --git a/mm/slab.c b/mm/slab.c
index 84c631f..613ae61 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2021,6 +2021,7 @@ static int setup_cpu_cache(struct kmem_c
} else {
cachep->array[smp_processor_id()] =
kmalloc(sizeof(struct arraycache_init), GFP_KERNEL);
+   BUG_ON(!cachep->array[smp_processor_id()]);
 
if (g_cpucache_up == PARTIAL_AC) {
set_up_list3s(cachep, SIZE_L3);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] fs/cifs/readdir.c: check kmalloc() return value.

2007-03-08 Thread Amit Choudhary
Description: Check the return value of kmalloc() in function cifs_readdir(), in 
file fs/cifs/readdir.c.

Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>

diff --git a/fs/cifs/readdir.c b/fs/cifs/readdir.c
index b5b0a2a..2d43b2a 100644
--- a/fs/cifs/readdir.c
+++ b/fs/cifs/readdir.c
@@ -1063,6 +1063,11 @@ int cifs_readdir(struct file *file, void
such multibyte target UTF-8 characters. cifs_unicode.c,
which actually does the conversion, has the same limit */
tmp_buf = kmalloc((2 * NAME_MAX) + 4, GFP_KERNEL);
+   if (!tmp_buf) {
+   cERROR(1, ("No memory!"));
+   rc = -ENOMEM;
+   goto rddir2_exit;
+   }
for(i=0;(ihttp://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] drivers/usb/serial/mos7840.c: check kmalloc() return value.

2007-03-08 Thread Amit Choudhary
Description: Check the return value of kmalloc() in function mos7840_get_reg(), 
in file drivers/usb/serial/mos7840.c.

Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>

diff --git a/drivers/usb/serial/mos7840.c b/drivers/usb/serial/mos7840.c
index 021be39..91d474b 100644
--- a/drivers/usb/serial/mos7840.c
+++ b/drivers/usb/serial/mos7840.c
@@ -475,6 +475,14 @@ static int mos7840_get_reg(struct moschi
int ret = 0;
buffer = (__u8 *) mcs->ctrl_buf;
 
+   /* The memory for ctrl_buf is allocated in
+* mos7840_startup(), but it is not checked if
+* kmalloc failed. So, mcs->ctrl_buf might be NULL.
+* So, it should be checked here.
+*/
+   if (!buffer)
+   return -ENOMEM;
+
 //  dr=(struct usb_ctrlrequest *)(buffer);
dr = (void *)(buffer + 2);
dr->bRequestType = MCS_RD_RTYPE;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] drivers/media/video/stv680.c: check kmalloc() return value.

2007-03-08 Thread Amit Choudhary
Description: Check the return value of kmalloc() in function 
stv680_start_stream(), in file drivers/media/video/stv680.c.

Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>

diff --git a/drivers/media/video/stv680.c b/drivers/media/video/stv680.c
index 6d1ef1e..a1ec3ac 100644
--- a/drivers/media/video/stv680.c
+++ b/drivers/media/video/stv680.c
@@ -687,7 +687,7 @@ static int stv680_start_stream (struct u
stv680->sbuf[i].data = kmalloc (stv680->rawbufsize, GFP_KERNEL);
if (stv680->sbuf[i].data == NULL) {
PDEBUG (0, "STV(e): Could not kmalloc raw data buffer 
%i", i);
-   return -1;
+   goto nomem_err;
}
}
 
@@ -698,7 +698,7 @@ static int stv680_start_stream (struct u
stv680->scratch[i].data = kmalloc (stv680->rawbufsize, 
GFP_KERNEL);
if (stv680->scratch[i].data == NULL) {
PDEBUG (0, "STV(e): Could not kmalloc raw scratch 
buffer %i", i);
-   return -1;
+   goto nomem_err;
}
stv680->scratch[i].state = BUFFER_UNUSED;
}
@@ -706,7 +706,7 @@ static int stv680_start_stream (struct u
for (i = 0; i < STV680_NUMSBUF; i++) {
urb = usb_alloc_urb (0, GFP_KERNEL);
if (!urb)
-   return -ENOMEM;
+   goto nomem_err;
 
/* sbuf is urb->transfer_buffer, later gets memcpyed to scratch 
*/
usb_fill_bulk_urb (urb, stv680->udev,
@@ -721,6 +721,21 @@ static int stv680_start_stream (struct u
 
stv680->framecount = 0;
return 0;
+
+ nomem_err:
+   for (i = 0; i < STV680_NUMSCRATCH; i++) {
+   kfree(stv680->scratch[i].data);
+   stv680->scratch[i].data = NULL;
+   }
+   for (i = 0; i < STV680_NUMSBUF; i++) {
+   usb_kill_urb(stv680->urb[i]);
+   usb_free_urb(stv680->urb[i]);
+   stv680->urb[i] = NULL;
+   kfree(stv680->sbuf[i].data);
+   stv680->sbuf[i].data = NULL;
+   }
+   return -ENOMEM;
+
 }
 
 static int stv680_stop_stream (struct usb_stv *stv680)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] scsi: check kmalloc() return value.

2007-03-08 Thread Amit Choudhary
Description: Check the return value of kmalloc() in function ch_readconfig(), 
in file drivers/scsi/ch.c.

Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>

diff --git a/drivers/scsi/ch.c b/drivers/scsi/ch.c
index f6caa43..fcd635b 100644
--- a/drivers/scsi/ch.c
+++ b/drivers/scsi/ch.c
@@ -324,7 +324,7 @@ ch_readconfig(scsi_changer *ch)
if (!buffer)
return -ENOMEM;
memset(buffer,0,512);
-   
+
memset(cmd,0,sizeof(cmd));
cmd[0] = MODE_SENSE;
cmd[1] = ch->device->lun << 5;
@@ -367,7 +367,7 @@ ch_readconfig(scsi_changer *ch)
} else {
vprintk("reading element address assigment page failed!\n");
}
-   
+
/* vendor specific element types */
for (i = 0; i < 4; i++) {
if (0 == vendor_counts[i])
@@ -384,6 +384,10 @@ ch_readconfig(scsi_changer *ch)
/* look up the devices of the data transfer elements */
ch->dt = kmalloc(ch->counts[CHET_DT]*sizeof(struct scsi_device),
 GFP_KERNEL);
+   if (!ch->dt) {
+   kfree(buffer);
+   return -ENOMEM;
+   }
for (elem = 0; elem < ch->counts[CHET_DT]; elem++) {
id  = -1;
lun = 0;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 0/2] NET: Multiple queue network device support REPOST

2007-03-08 Thread Waskiewicz Jr, Peter P
> This is not a problem.
> 
> Since the ->enqueue function stores references to the SKBs, 
> any change of the dev->qdisc has to flush those references 
> somehow, and it is at that point that you can fixup the skb 
> queue mappings.
> 
> This happens via invoking the qdisc->ops->reset() method.
> 

Thanks Dave.  It seems expensive to change all the skb's if this type of
event occurs, but I suppose it is a corner case and can be tolerated.
Please let me make this change and I'll resubmit.

Also, the first patch set was RFC, and you were the only one to submit
feedback to me.  :-/  So I figured submitting as an official patch for
feedback might get more attention.

And to reiterate what Auke asked, what did happen to the patches?  I see
things getting through the netdev mailing list, but they're not showing
up on www.spinics.net/lists/netdev.  Was it something I did when
sending, or did majordomo hiccup?

Thanks,

-PJ Waskiewicz
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] drivers/media/video/videocodec.c: check kmalloc() return value.

2007-03-08 Thread Amit Choudhary
Description: Check the return value of kmalloc() in function 
videocodec_build_table(), in file drivers/media/video/videocodec.c.

Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>

diff --git a/drivers/media/video/videocodec.c b/drivers/media/video/videocodec.c
index 2ae3fb2..16fc1dd 100644
--- a/drivers/media/video/videocodec.c
+++ b/drivers/media/video/videocodec.c
@@ -348,6 +348,8 @@ #define LINESIZE 100
kfree(videocodec_buf);
videocodec_buf = (char *) kmalloc(size, GFP_KERNEL);
 
+   if (!videocodec_buf)
+   return 0;
i = 0;
i += scnprintf(videocodec_buf + i, size - 1,
  "lave or attached aster name  type flagsmagic   
 ");
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] NET: Multiple queue network device support REPOST

2007-03-08 Thread David Miller
From: "Waskiewicz Jr, Peter P" <[EMAIL PROTECTED]>
Date: Thu, 8 Mar 2007 22:42:19 -0800

> This was taken into consideration, and I did reply that my concern for
> doing that could cause stale data in the skb if the queue mapping
> changed.

This is not a problem.

Since the ->enqueue function stores references to the SKBs,
any change of the dev->qdisc has to flush those references
somehow, and it is at that point that you can fixup the
skb queue mappings.

This happens via invoking the qdisc->ops->reset() method.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] swsusp: Disable nonboot CPUs before entering platform suspend

2007-03-08 Thread Rafael J. Wysocki
On Friday, 9 March 2007 02:11, Len Brown wrote:
> On Wednesday 07 March 2007 18:13, Rafael J. Wysocki wrote:
> > On Wednesday, 7 March 2007 23:49, Andrew Morton wrote:
> > > On Wed, 7 Mar 2007 23:14:29 +0100
> > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> > > 
> > > > On Wednesday, 7 March 2007 22:16, Andrew Morton wrote:
> > > > > On Wed, 7 Mar 2007 20:44:11 +0100
> > > > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> > > > > 
> > > > > > From: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > > > > > 
> > > > > > Prevent the WARN_ON() in 
> > > > > > arch/x86_64/kernel/acpi/sleep.c:init_low_mapping()
> > > > > > from triggering by disabling nonboot CPUs before we finally enter 
> > > > > > the platform
> > > > > > suspend.
> > > > > > 
> > > > > > Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > > > > > ---
> > > > > >  kernel/power/disk.c |1 +
> > > > > >  kernel/power/user.c |2 +-
> > > > > >  2 files changed, 2 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > Index: linux-2.6.21-rc2-mm2/kernel/power/disk.c
> > > > > > ===
> > > > > > --- linux-2.6.21-rc2-mm2.orig/kernel/power/disk.c
> > > > > > +++ linux-2.6.21-rc2-mm2/kernel/power/disk.c
> > > > > > @@ -61,6 +61,7 @@ static void power_down(suspend_disk_meth
> > > > > > switch(mode) {
> > > > > > case PM_DISK_PLATFORM:
> > > > > > if (pm_ops && pm_ops->enter) {
> > > > > > +   disable_nonboot_cpus();
> > > > > > kernel_shutdown_prepare(SYSTEM_SUSPEND_DISK);
> > > > > > pm_ops->enter(PM_SUSPEND_DISK);
> > > > > > break;
> > > > > > Index: linux-2.6.21-rc2-mm2/kernel/power/user.c
> > > > > > ===
> > > > > > --- linux-2.6.21-rc2-mm2.orig/kernel/power/user.c
> > > > > > +++ linux-2.6.21-rc2-mm2/kernel/power/user.c
> > > > > > @@ -398,9 +398,9 @@ static int snapshot_ioctl(struct inode *
> > > > > >  
> > > > > > case PMOPS_ENTER:
> > > > > > if (data->platform_suspend) {
> > > > > > +   disable_nonboot_cpus();
> > > > > > 
> > > > > > kernel_shutdown_prepare(SYSTEM_SUSPEND_DISK);
> > > > > > error = pm_ops->enter(PM_SUSPEND_DISK);
> > > > > > -   error = 0;
> > > > > > }
> > > > > > break;
> > > > > 
> > > > > Is this considered 2.6.21 material?  If so why?
> > > > 
> > > > Well, the WARN_ON() in 
> > > > arch/x86_64/kernel/acpi/sleep.c:init_low_mapping()
> > > > triggers every time an SMP x86_64 box is suspended to disk using the 
> > > > platform
> > > > mode (default), which is quite annoying IMHO and users think something 
> > > > wrong is
> > > > going on.  This will probably cause them to report the problem and I'd 
> > > > rather
> > > > like to avoid handling these reports. ;-)
> > > 
> > > Well sure - if patches were always error-free, we'd always apply them
> > > immediately.
> > > 
> > > The question is: is the risk of this patch breaking things exceeded by the
> > > benefit which you describe?
> > 
> > Well, it has survived some testing (http://lkml.org/lkml/2007/3/7/16).  
> > Also,
> > before the code ordering in 2.6.21-rc* we had been running on one CPU
> > here, so I think the risk is small.
> > 
> > We could remove the WARN_ON() as Pavel has just suggested, but first I'd 
> > like
> > to know who put it there and why.
> > 
> 
> Shaohua added it between 2.6.17 and 2.6.18
> 55b2355eefc2f160246226d4d69fed431173a4d5

Yes, DaveJ has already noticed that, but what it's there for?

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


"swapon" function manpage

2007-03-08 Thread zhangxiliang
hello,
  The manpage of "swapon" function since Linux 2.6.17 has some error.
  The MAX_SWAPFILES should be 30 in the latest version. Swap migration uses
the two higest numbers of swap types (30 and 31).

Regards
Zhang Xiliang 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler

2007-03-08 Thread hui
On Thu, Mar 08, 2007 at 10:31:48PM -0800, Linus Torvalds wrote:
> On Thu, 8 Mar 2007, Bill Davidsen wrote:
> > Please, could you now rethink plugable scheduler as well? Even if one had to
> > be chosen at boot time and couldn't be change thereafter, it would still 
> > allow
> > a few new thoughts to be included.
> 
> No. Really.
> 
> I absolutely *detest* pluggable schedulers. They have a huge downside: 
> they allow people to think that it's ok to make special-case schedulers. 
> And I simply very fundamentally disagree.

Linus,

This is where I have to respectfully disagree. There are types of loads
that aren't covered in SCHED_OTHER. They are typically certain real time
loads and those folks (regardless of -rt patch) would benefit greatly
from having something like that in place. Those scheduler developers can
plug in (at compile time) their work without having to track and forward
port their code constantly so that non-SCHED_OTHER policies can be
experimented with easily.

This is especially so with rate monotonic influenced schedulers that are
in the works by real time folks, stock kernel or not. This is about
making Linux generally accessible to those folks and not folks doing
SCHED_OTHER work. They are orthogonal.

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

2007-03-08 Thread Roland McGrath
> That sounds like a rather fragile approach to avoiding a minimal amount of 
> work.  Debug exceptions don't occur very often, and when they do it won't 
> matter too much if we go through some extra notifier-chain callouts.

When single-stepping occurs it happens repeatedly many times, and that
doesn't need any more overhead than it already has.  

> It turns out that this won't work correctly unless I use something
> stronger, like a spinlock or RCU.  Either one seems like overkill.

What is the problem with just clearing TIF_DEBUG?  It just means that in
the SIGKILL case, the dying thread won't switch in its local debugregs.
The global kernel allocations will already be set in the processor from the
previous context, and old user-address allocations do no harm since we
won't run in user mode again before switching out at the end of do_exit.

> Is there any way to find out from within the
> switch_to_thread_hw_breakpoint routine whether the task is in this unusual
> state?  (By which I mean the task is being debugged and the debugger
> hasn't told it to start running.)  Would (tsk->exit_code == SIGKILL) work?  

That won't necessarily work.  There isn't any cheap check that won't also
catch a task preempted on its way to stopping for the debugger.  

> If not, can we add a TIF_DEBUG_STOPPED flag?  

I'm not clear on what that would mean, but it's probably not an idea I like.

> Or should I just go with a spinlock?

If it's really necessary, but it hasn't proved so for any other switched
per-thread state.  As long as you aren't doing per-thread kernel-mode
allocations, I don't see why you need anything other than TIF_DEBUG.

> Is SIGKILL the only way this can happen?

It should be, but there might be some stray wake_up_process calls in the
kernel that can violate [up]trace's supposed monopoly on TASK_TRACED (or
duopoly with SIGKILL, I suppose I should say).  If there is no SIGKILL,
then the task will just call schedule again nearly immediately to go back
to blocking, which will switch out unless there is a second wakeup right
then.

> In a similar vein, I need a reliable way to know whether a task has gone 
> through exit_thread().  If it has, then its hw_breakpoint area has been 
> deallocated and a new one must not be allocated.  Will (tsk->flags & 
> PF_EXITING) always be true once that happens?

PF_EXITING it set after there is no possibility of returning to user mode,
but a while before exit_thread, when you might still want kernel-mode
breakpoints.  If the only per-thread allocations you support are for user
mode, then you can certainly refuse to do any when PF_EXITING is set.


Thanks,
Roland

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Use more gcc extensions in the Linux headers

2007-03-08 Thread Andrey Panin
On 068, 03 09, 2007 at 04:56:32PM +1100, Rusty Russell wrote:
> __builtin_types_compatible_p() has been around since gcc 2.95,

but it's not available in Intel C compiler IIRC :(

> and we don't use it anywhere.  This patch quietly fixes that.
>
> Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
> 
> diff -r f0ff8138f993 include/linux/kernel.h
> --- a/include/linux/kernel.h  Fri Mar 09 16:40:25 2007 +1100
> +++ b/include/linux/kernel.h  Fri Mar 09 16:44:04 2007 +1100
> @@ -35,7 +35,9 @@ extern const char linux_proc_banner[];
>  #define ALIGN(x,a)   __ALIGN_MASK(x,(typeof(x))(a)-1)
>  #define __ALIGN_MASK(x,mask) (((x)+(mask))&~(mask))
>  
> -#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
> +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])  
>   \
> + + sizeof(typeof(int[1 - 2*!!__builtin_types_compatible_p(typeof(arr), \
> +  typeof([0]))]))*0)
>  #define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f))
>  #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
>  #define roundup(x, y) x) + ((y) - 1)) / (y)) * (y))
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Andrey Panin| Linux and UNIX system administrator
[EMAIL PROTECTED]   | PGP key: wwwkeys.pgp.net


signature.asc
Description: Digital signature


Re: [PATCH 0/2] NET: Multiple queue network device support REPOST

2007-03-08 Thread Kok, Auke

David Miller wrote:

You didn't address my correction the other day wherein I clarified
for you that my idea was not to store the queue mapping in
skb->priority but rather to shrink skb->priority to a u16 and
add a new u16 skb->queue_mapping or whatever field to store the
necessary information.

You're just posting a set of patches using the same approach again
plus some bug fixes, so there is essentially nothing new for anyone to
review.

Why ask for feedback if you fail to take any of it into consideration?
:-/


Where are the patches anyway? They didn't seem to make it either to netdev nor 
lkml!

Auke
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] 2.6.21-rc1: known regressions (part 2)

2007-03-08 Thread Pavel Machek
Hi!

> Pavel, I tried with your .config, and indeed the system came back to life 
> after
> 2-3 minutes after I press Fn/F4, indeed the issue seems to be with the disk.
> It could be that the same takes place with my original .config - maybe
> I just wasn't patient enough. I'll need to re-test that.
> 
> However, I noticed that, after resume, when the system is presumably 
> functional,
> if I try to suspend to ram again, this second suspend hangs, displaying
> the following on screen:
> 
> [   17.17] ACPI: PCI Interrupt :02:00.0[A] -> GSI 16 (level, low) -> 
> IRQ 20
> [   17.17] PCI: Setting latency timer of device :02:00.0 to 64
> [   17.25] e1000: :02:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width 
> x1) 00:16:41:5
> 4:6c:47
> [   17.33] e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
> 
> the crescent LED starts blinking and does not seem to stop for at lest 10 min,
> I've run out of patience after that. It could be that it's just very slow 
> again.
> 
> Pavel, did you try suspend to RAM after a successfull resume from
> RAM?

Seems to work ok in -rc3... as long as I do not mix s2ram with s2disk.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] spi subsystem: destroy the spi_bitbang workqueue only after the spi master is unregistered

2007-03-08 Thread David Brownell
On Wednesday 07 March 2007 3:48 pm, Chris Lesiak wrote:
> From: Chris Lesiak <[EMAIL PROTECTED]>
> 
> This patch fixes a bug in the cleanup of an spi_bitbang bus. 

It's nearly right, but see below.


> @@ -505,28 +499,10 @@ EXPORT_SYMBOL_GPL(spi_bitbang_start);
>   */
>  int spi_bitbang_stop(struct spi_bitbang *bitbang)
>  {
> - unsignedlimit = 500;
> -
> - spin_lock_irq(>lock);
> - bitbang->shutdown = 0;
> - while (!list_empty(>queue) && limit--) {
> - spin_unlock_irq(>lock);
> -
> - dev_dbg(bitbang->master->cdev.dev, "wait for queue\n");
> - msleep(10);
> -
> - spin_lock_irq(>lock);
> - }

You completely removed an odious busy-wait, which is good ...

> - spin_unlock_irq(>lock);
> - if (!list_empty(>queue)) {
> - dev_err(bitbang->master->cdev.dev, "queue didn't empty\n");
> - return -EBUSY;
> - }
> + spi_unregister_master(bitbang->master);

... but right here there should be a WARN_ON(!list_empty(...)) to
flag the corresponding bogosity:  that somehow a request never
completed.

>  
>   destroy_workqueue(bitbang->workqueue);
>  
> - spi_unregister_master(bitbang->master);
> -
>   return 0;
>  }
>  EXPORT_SYMBOL_GPL(spi_bitbang_stop);
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 0/2] NET: Multiple queue network device support REPOST

2007-03-08 Thread Waskiewicz Jr, Peter P
> -Original Message-
> From: David Miller [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, March 08, 2007 10:22 PM
> To: Waskiewicz Jr, Peter P
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; 
> Leech, Christopher
> Subject: Re: [PATCH 0/2] NET: Multiple queue network device 
> support REPOST
> 
> 
> You didn't address my correction the other day wherein I 
> clarified for you that my idea was not to store the queue mapping in
> skb->priority but rather to shrink skb->priority to a u16 and
> add a new u16 skb->queue_mapping or whatever field to store 
> the necessary information.
> 
> You're just posting a set of patches using the same approach 
> again plus some bug fixes, so there is essentially nothing 
> new for anyone to review.
> 
> Why ask for feedback if you fail to take any of it into consideration?
> :-/
> 

This was taken into consideration, and I did reply that my concern for
doing that could cause stale data in the skb if the queue mapping
changed.  If a qdisc was implemented that could change the band to queue
mapping without having to reload the qdisc, the result could have skb's
heading for the wrong queues until the old data was drained from the
bands.  An example:

->enqueue() - maps queue, commits to skb, adds to band
netif_stop_queue(dev) - event is triggered that could cause a qdisc to
remap bands to queues, drain hardware queues
netif_wake_queue(dev) - reconfiguration is complete, resume transmission
->dequeue() - grab an skb enqueued prior to reconfiguration, read queue
from skb, hard_start_xmit() to the wrong queue
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Use more gcc extensions in the Linux headers

2007-03-08 Thread Linus Torvalds


On Fri, 9 Mar 2007, Rusty Russell wrote:
>
> __builtin_types_compatible_p() has been around since gcc 2.95, and we
> don't use it anywhere.  This patch quietly fixes that.

Whee.

Rusty, that's a work of art.

However, I would suggest that you never show it to anybody ever again. I'm 
sure that in fifty years, it will be worth much more. So please keep it 
tightly under wraps, to keep people from gouging their eyes out^W^W^W^W^W^W^W 
make a killing in the art market.

Please.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix read past end of array in md/linear.c

2007-03-08 Thread Andy Isaacson
On Thu, Mar 08, 2007 at 09:37:46PM -0500, Bill Davidsen wrote:
> Andy Isaacson wrote:
> >% dd bs=1 seek=840716287 if=/dev/zero of=d1 count=1
> >% for i in 2 3 4; do dd if=/dev/zero of=d$i bs=1k count=$(($i+150)); done
[snip]
> >-for (j=i; i >+for (j=i; j > sz += conf->disks[j].size;
> 
> After looking at that code, I have to wonder how this ever worked, or if 
> in fact anyone ever took this path. I assume that the value of sz caused 
> the loop exit in all cases, since this has been in the code at least 
> since 2.6.15, oldest thing I have handy.

Well, just about any sane set of device sizes causes sz to rapidly
exceed min_spacing.  You'll notice that my failure case is
{ 800MB, 151kB, 152kB, 153kB, 154kB }.

And even in the failure case, it's just a read from uninitialized
memory, which is probably either a small value (so it won't make the
answer very wrong) or a large value (so it will be rejected in the
immediately following code).  In my case it happened to be some slab
poison of 0xa5a5a5a5 or something like that, and the code went on just
fine.

-andy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler

2007-03-08 Thread Linus Torvalds


On Thu, 8 Mar 2007, Bill Davidsen wrote:
>
> Please, could you now rethink plugable scheduler as well? Even if one had to
> be chosen at boot time and couldn't be change thereafter, it would still allow
> a few new thoughts to be included.

No. Really.

I absolutely *detest* pluggable schedulers. They have a huge downside: 
they allow people to think that it's ok to make special-case schedulers. 
And I simply very fundamentally disagree.

If you want to play with a scheduler of your own, go wild. It's easy 
(well, you'll find out that getting good results isn't, but that's a 
different thing). But actual pluggable schedulers just cause people to 
think that "oh, the scheduler performs badly under circumstance X, so 
let's tell people to use special scheduler Y for that case".

And CPU scheduling really isn't that complicated. It's *way* simpler than 
IO scheduling. There simply is *no*excuse* for not trying to do it well 
enough for all cases, or for having special-case stuff.

But even IO scheduling actually ends up being largely the same. Yes, we 
have pluggable schedulers, and we even allow switching them, but in the 
end, we don't want people to actually do it. It's much better to have a 
scheduler that is "good enough" than it is to have five that are "perfect" 
for five particular cases.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc3-mm1 RSDL results

2007-03-08 Thread Con Kolivas
On Friday 09 March 2007 16:39, Matt Mackall wrote:
> First off, let me say that I think your approach has great promise,
> but I'm afraid it doesn't work so well here yet.
>
> Box is an R51 Thinkpad, 1.7GHz Pentium M. I'm using a make -j 5 as a
> test load.
>
> With 2.6.21-rc2-mm2, I get slightly sluggish response for opening new
> terminals, scrolling in Galeon, and a bit jerky behaviour for spinning
> Beryl's 3D desktop. Playing MP3s off an sshfs FUSE mount works fine.
> Typing across ssh sessions has no noticeable lag. Mouse pointer
> movement is smooth.
>
> With 2.6.21-rc3-mm1, terminals take longer to open, Galeon is
> noticeably more sluggish, and Beryl's desktop switching goes from being
> jerky to a 5-second agony. Typing in shells, remote or not,
> lags noticeably. Mouse pointer is alternately smooth or jerky. But
> MP3s still work great!
>
> Problems persist with make -j 2 and make.

make -j5 sucks you'll get precisely 1/6th cpu for galeon with this scheduler 
which is perfectly fair and I make no apology for it, nor do I plan to 
optimise for it. With make (without jobs) you'll still only get 50% cpu so it 
should be precisely half speed unless you nice it. Does it feel precisely 
half speed? It's supposed to. This is one of the drawbacks of a perfectly 
fair approach; its... fair and will need more liberal use of nice.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Use more gcc extensions in the Linux headers

2007-03-08 Thread Stephen Rothwell
On Fri, 09 Mar 2007 16:56:32 +1100 Rusty Russell <[EMAIL PROTECTED]> wrote:
>
> __builtin_types_compatible_p() has been around since gcc 2.95, and we
> don't use it anywhere.  This patch quietly fixes that.

After staring at this for about 2 minutes, how about a commit message like:

Make ARRAY_SIZE complain strangely if passed a pointer instead of an
array.

:-)
--
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpbCZVxyFXZc.pgp
Description: PGP signature


Re: [PATCH 0/2] NET: Multiple queue network device support REPOST

2007-03-08 Thread David Miller

You didn't address my correction the other day wherein I clarified
for you that my idea was not to store the queue mapping in
skb->priority but rather to shrink skb->priority to a u16 and
add a new u16 skb->queue_mapping or whatever field to store the
necessary information.

You're just posting a set of patches using the same approach again
plus some bug fixes, so there is essentially nothing new for anyone to
review.

Why ask for feedback if you fail to take any of it into consideration?
:-/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: block_til_ready

2007-03-08 Thread Eric Dumazet

Mockern a écrit :

Hi,

What is the simpliest implementation of block_til_ready for tty driver?

Thanks,

Andy


Welcome Andy

Since your messages always make me wonder if you are some kind of robot, able 
to post one "one line" message to lkml everyday, I have one suggestion :


Try next times to build messages with a detailed context, so that eventually 
one reader of this messages might care, possibly *understand* your question, 
and thus be able to possibly help you and others as well.


For example, an url to your sources, even if a 'not yet working' state, would 
be very nice.


Have a nice day
Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


block_til_ready

2007-03-08 Thread Mockern
Hi,

What is the simpliest implementation of block_til_ready for tty driver?

Thanks,

Andy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Use more gcc extensions in the Linux headers

2007-03-08 Thread Rusty Russell
__builtin_types_compatible_p() has been around since gcc 2.95, and we
don't use it anywhere.  This patch quietly fixes that.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r f0ff8138f993 include/linux/kernel.h
--- a/include/linux/kernel.hFri Mar 09 16:40:25 2007 +1100
+++ b/include/linux/kernel.hFri Mar 09 16:44:04 2007 +1100
@@ -35,7 +35,9 @@ extern const char linux_proc_banner[];
 #define ALIGN(x,a) __ALIGN_MASK(x,(typeof(x))(a)-1)
 #define __ALIGN_MASK(x,mask)   (((x)+(mask))&~(mask))
 
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])
  \
+   + sizeof(typeof(int[1 - 2*!!__builtin_types_compatible_p(typeof(arr), \
+typeof([0]))]))*0)
 #define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f))
 #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
 #define roundup(x, y) x) + ((y) - 1)) / (y)) * (y))


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sys_write() racy for multi-threaded append?

2007-03-08 Thread Eric Dumazet

Michael K. Edwards a écrit :

On 3/8/07, Eric Dumazet <[EMAIL PROTECTED]> wrote:
Absolutely not. We dont want to slow down kernel 'just in case a fool 
might

want to do crazy things'


Actually, I think it would make the kernel (negligibly) faster to bump
f_pos before the vfs_write() call.  Unless fget_light sets fput_needed
or the write doesn't complete cleanly, you won't have to touch the
file table entry again after vfs_write() returns.  You can adjust
vfs_write to grab f_dentry out of the file before going into
do_sync_write.  do_sync_write is done with the struct file before it
goes into the aio_write() loop.  Result: you probably save at least an
L1 cache miss, unless the aio_write loop is so frugal with L1 cache
that it doesn't manage to evict the struct file.

Patch to follow.


Dont even try, you *cannot* do that, without breaking the standards, or 
without a performance drop.


The only safe way would be to lock the file during the whole read()/write() 
syscall, and we dont want this (this would be more expensive than current)

Dont forget 'file' may be some sockets/tty/whatever, not a regular file.

Standards are saying :

If an error occurs, file pointer remains unchanged.

You cannot know for sure how many bytes will be written, since write() can 
returns a count that is different than buflen.


So you cannot update fpos before calling vfs_write()

About your L1 'miss', dont forget that multi-threaded apps are going to 
atomic_dec_and_test(>f_count) anyway when fput() is done at the end of 
syscall. And you were concerned about multi-threaded apps, didnt you ?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.21-rc3-mm1 RSDL results

2007-03-08 Thread Matt Mackall
First off, let me say that I think your approach has great promise,
but I'm afraid it doesn't work so well here yet.

Box is an R51 Thinkpad, 1.7GHz Pentium M. I'm using a make -j 5 as a
test load.

With 2.6.21-rc2-mm2, I get slightly sluggish response for opening new
terminals, scrolling in Galeon, and a bit jerky behaviour for spinning
Beryl's 3D desktop. Playing MP3s off an sshfs FUSE mount works fine.
Typing across ssh sessions has no noticeable lag. Mouse pointer
movement is smooth.

With 2.6.21-rc3-mm1, terminals take longer to open, Galeon is
noticeably more sluggish, and Beryl's desktop switching goes from being
jerky to a 5-second agony. Typing in shells, remote or not,
lags noticeably. Mouse pointer is alternately smooth or jerky. But
MP3s still work great!

Problems persist with make -j 2 and make.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/21] 2.6.19-stable review

2007-03-08 Thread Adrian Bunk
On Wed, Feb 21, 2007 at 02:36:40PM +0100, Stefan Richter wrote:
> Greg KH wrote:
> > This is the start of the stable review cycle for the 2.6.19.5 release.
> > 
> > This will probably be the last release of the 2.6.19-stable series, so
> > if there are patches that you feel should be applied to that tree,
> > please let me know.
> 
> There is one here: "Missing critical phys_to_virt in lib/swiotlb.c".
> http://lkml.org/lkml/2007/2/4/116
> It fixes a DMA related bug which was seen with a variety of drivers
> especially on EM64T machines with more than 3GB RAM. I hope you can
> extract the patch from this MIME attachment.
> 
> Adrian, AFAICS it applies as-is to 2.6.16.y too. I don't have a machine
> to test personally, but it is quite obvious.

Thanks.

It applies fine and it's now in my tree.

> The mentioned bigger patch has been merged by Linus between 2.6.20 and
> 2.6.21-rc1.


cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix sparc TIF_USEDFPU flag atomicity

2007-03-08 Thread David Miller
From: Mathieu Desnoyers <[EMAIL PROTECTED]>
Date: Thu, 8 Mar 2007 22:12:27 -0500

> Fix sparc TIF_USEDFPU flag atomicity
> 
> Non atomic update of TIF can be very dangerous, except at thread structure
> creation time. Here I standardize the TIF_USEDFPU usage of the sparc arch.
> 
> Applies on 2.6.20.
> 
> Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>

Also applied, thanks a lot.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix atomicity of TIF update in flush_thread() for sparc64

2007-03-08 Thread David Miller
From: Mathieu Desnoyers <[EMAIL PROTECTED]>
Date: Thu, 8 Mar 2007 21:38:14 -0500

> Fix atomicity of TIF update in flush_thread() for x86_64
^^
You mean sparc64 of course, I fixed this up while committing your
patch, thanks a lot.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] futex: restartable futex_wait?

2007-03-08 Thread Nick Piggin
On Fri, Mar 09, 2007 at 12:02:31AM +0100, Thomas Gleixner wrote:
> On Thu, 2007-03-08 at 18:29 +0100, Ingo Molnar wrote:
> > * Nick Piggin <[EMAIL PROTECTED]> wrote:
> > 
> > > Hi Ingo,
> > > 
> > > I'm seeing an LTP test fail for ltp test sigaction_16_24. Basically, 
> > > it tests whether the SA_RESTART flag works for the sem_wait operation.
> 
> Not sure, whether the testcase is correct or not. See below
> 
> > > I see sem_wait is implemented with futex_wait, so I wonder whether we 
> > > can make it restartable? Am I going about it the right way? (Seems to 
> > > fix the testcase here).
> > 
> > i think that's quite right. I'm wondering why this never came up before? 
> > But your fix is not complete i think:
> > 
> > > + restart->arg2 = time;
> > > + return -ERESTART_RESTARTBLOCK;
> > > + }
> > 
> > 'time' here is relative, so the restarted syscall will do a /full/ wait 
> > again.
> > 
> > maybe we should rather convert futex timed-waits to hrtimers? Thomas?
> 
> The problem is that the original API is based on relative time and
> therefor can not be changed. 
> 
> sem_wait returns -EINTR to the application when it is interrupted, while
> pthread_mutex_lock does not.

But this still means sem_wait should restart if SA_RESTART is set, right?

And pthread_mutex_lock could be implemented to not return -EINTR, even if
futex_wait does, couldn't it? (I guess it probably already is, considering
that futex_wait alsready returns -EINTR).

> 
> http://www.opengroup.org/onlinepubs/009695399/functions/sem_wait.html
> 
> http://www.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_lock.html
> 
> We need to create a seperate op for the futex - just like the pi_futex
> and use absolute time there too. 
> 
>   tglx
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RSDL-mm 0/6] RSDL cpu scheduler for -mm

2007-03-08 Thread Con Kolivas
On Thursday 08 March 2007 13:54, Andrew Morton wrote:
> On Wed, 7 Mar 2007 17:43:45 -0800 Andrew Morton <[EMAIL PROTECTED]> 
wrote:
> > On Wed, 7 Mar 2007 12:26:42 +1100
> >
> > Con Kolivas <[EMAIL PROTECTED]> wrote:
> > > What follows is the same patch series that constitutes the RDSL
> > > "Rotating Staircase DeadLine" cpu scheduler resynced for
> > > 2.6.21-rc2-mm2.
> >
> > Big oops early in boot on x86_64 SMP, in rq_bitmap_error+0x97/0x9f.
> >
> > I stubbed it out with a `return MAX_RT_PRIO;' (I think) but it then
> > oopsed differently.  Before netconsole had come up, no serial console, no
> > digital camera.
> >
> > There's stuff in http://userweb.kernel.org/~akpm/ck/ - you can probably
> > boot that kernel on your own machine.
> >
> > I need to do rc3-mm1 now.  I might find some time to poke at this
> > further after that, but I have to leave for a week in .jp and it'll be
> > squeezy, sorry.
>
> well it boots os dual pIII and quad powerpc.
>
> The powerpc says
>
> Scheduler bitmap error - bitmap being reconstructed..
>
> during bootup.  But it didn't crash like the Nocona machine.

Any chance I could get the config of this powerpc so I can try building and 
running one on qemu? I can't seem to get this bitmap error on x86*.

Thanks!

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] futex: restartable futex_wait?

2007-03-08 Thread Nick Piggin
On Thu, Mar 08, 2007 at 06:29:02PM +0100, Ingo Molnar wrote:
> 
> * Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > Hi Ingo,
> > 
> > I'm seeing an LTP test fail for ltp test sigaction_16_24. Basically, 
> > it tests whether the SA_RESTART flag works for the sem_wait operation.
> > 
> > I see sem_wait is implemented with futex_wait, so I wonder whether we 
> > can make it restartable? Am I going about it the right way? (Seems to 
> > fix the testcase here).
> 
> i think that's quite right. I'm wondering why this never came up before? 
> But your fix is not complete i think:
> 
> > + restart->arg2 = time;
> > + return -ERESTART_RESTARTBLOCK;
> > + }
> 
> 'time' here is relative, so the restarted syscall will do a /full/ wait 
> again.

But it has been modified by schedule_timeout?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] Software Suspend: Fix suspend when console is in VT_AUTO/KD_GRAPHICS mode

2007-03-08 Thread Andrew Johnson
Daniel Drake wrote:
> Andrew Johnson wrote:
> > When the console is in VT_AUTO/KD_GRAPHICS mode, switching to the
> > SUSPEND_CONSOLE fails, resulting in vt_waitactive() waiting
indefinately
> > or until the task is interrupted.  The following patch tests if a
> > console switch can occur in set_console() and returns early if a
console
> > switch is not possible.
> >
> > Signed-off-by: Andrew Johnson <[EMAIL PROTECTED]>
> >
> > diff -rup linux-2.6.20.1/drivers/char/vt.c linux/drivers/char/vt.c
> > --- linux-2.6.20.1/drivers/char/vt.c2007-02-19
22:34:32.0 -
> 0800
> > +++ linux/drivers/char/vt.c 2007-03-08 14:15:41.0 -0800
> > @@ -2188,10 +2188,20 @@ static void console_callback(struct work
> > release_console_sem();
> >  }
> >
> > -void set_console(int nr)
> > +extern char vt_dont_switch;
> > +
> > +int set_console(int nr)
> >  {
> > +   struct vc_data *vc = vc_cons[fg_console].d;
> > +
> > +   if(!vc_cons_allocated(nr) || vt_dont_switch || vc->vc_mode ==
> > KD_GRAPHICS) {
> > +   return -EINVAL;
> > +   }
> > +
> > want_console = nr;
> > schedule_console_callback();
> > +
> > +   return 0;
> >  }
> 
> I haven't tested, but I think the above -EINVAL return will break
chvt.
> chvt uses the VT_ACTIVATE ioctl which calls set_console(), and it is
> valid for chvt to be used to change away from a graphics-mode console
--
> that shouldn't be an error condition.

Currently the VT_ACTIVATE ioctl will eventually result in a call to
change_console() which will ignore the change if the console is in
KD_GRAPHICS+VT_AUTO mode.  Thus, chvt should fail.  However, the return
code from set_console() is ignored in vt_ioctl so no error code will
result (which is the same as current behavior).

-- Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question: schedule()

2007-03-08 Thread Bill Davidsen

albcamus wrote:

your kthread IS preemptible unless you call preempt_disable or some
locking functions explicitly .

I think he's trying to go the other way, make his thread the highest 
priority to blow anything else in the system out of the water. See his 
previous post "how to make kernel thread more faster?"


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Paul Jackson
Matt wrote:
> It's like that Star Trek episode ... except we can't agree on the name

Usually, when there is this much heat and smoke over a name, there is
really an underlying disagreement or misunderstanding over the meaning
of something.

The name becomes the proxy for meaning ;).

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PAGE_SIZE Availability Inconsistency

2007-03-08 Thread David Miller
From: "H. Peter Anvin" <[EMAIL PROTECTED]>
Date: Thu, 08 Mar 2007 20:31:05 -0800

> The advantage would be that it wouldn't require a v3 for platforms for 
> which MIN_PAGE_SIZE == PAGE_SIZE, which accounts for a very large 
> percentage of systems.
> 
> You still have to look for the darn magic in two places, so there is no 
> reason for it to be different.

Good point.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc3-mm2

2007-03-08 Thread Michael

Hello Andrew,

I found a little "hickup" in the mm kernel series since 2.6.21-rc2-mm1/mm2.
1.) appeared while boot (no VFS mounted at time)

BUG: at arch/i386/mm/highmem.c:61 kmap_atomic()
[]  []  []  []  []
[]  []  []  []  []
[]  []  []  []  []
[]  []  []  []
===
BUG: at arch/i386/mm/highmem.c:61 kmap_atomic()
[]  []  []  []  []
[]  []  []  []  []
[]  []  []  []  []
[]  []  ===

2.) some time after when I run some ups i hit this
BUG: atomic counter underflow at:

[]  []  []  []  []
[]  []  []  []  []
[]  []  []  []  []
===

Then in 2.6.21-rc3-mm2

Now for rc3-mm2 the bug of under 1.) of rc2-mm1/mm2 gone.

But still here -> underflow..*huh*
BUG: atomic counter underflow at:
[]  []  []  []  []
[]  []  []  []  []
[]  []  ===

Also I found some mis-beheviour of the Attansic "atl1 driver"
Maybe I address it wrong but I don't know (sure) who is the real maintainer.
Well I looked at atl1_main.c but to be honest there aren't obvious
information to whom/where I should address such issues.

Could you please so kind to address it to the right person?

atl1: hw csum wrong pkt_flag:1600, err_flag:80

All these hickups never appeared in the latest vanilla kernel
2.6.21-rc2 + even the last git-updates
2.6.21-rc3-git3 there is also not such behaviour

Thanks for your patience.

Best regards
Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PAGE_SIZE Availability Inconsistency

2007-03-08 Thread H. Peter Anvin

David Miller wrote:

From: "H. Peter Anvin" <[EMAIL PROTECTED]>
Date: Thu, 08 Mar 2007 20:18:28 -0800


Anton Blanchard wrote:

The other option is to create a v3 swap format that doesnt use any
PAGE_SIZE parameters.
The best thing to do would be to look for the magic both at PAGE_SIZE 
(for compatibility) and MIN_PAGE_SIZE (for sanity.)


That might work, but a large part of me says to go for v3
and do it cleanly.


The advantage would be that it wouldn't require a v3 for platforms for 
which MIN_PAGE_SIZE == PAGE_SIZE, which accounts for a very large 
percentage of systems.


You still have to look for the darn magic in two places, so there is no 
reason for it to be different.


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Paul Jackson
> The real trick is that I believe these groupings are designed to be something
> you can setup on login and then not be able to switch out of.  Which means
> we can't use sessions and process groups as the grouping entities as those 
> have different semantics.

Not always on login.  For big administered systems, we use batch schedulers
to manage the placement of multiple jobs, submitted to a run queue by users,
onto the available compute resources.

But I agree with your conclusion - the existing task grouping mechanisms,
while useful for some purposes, don't meet the need here.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PAGE_SIZE Availability Inconsistency

2007-03-08 Thread David Miller
From: "H. Peter Anvin" <[EMAIL PROTECTED]>
Date: Thu, 08 Mar 2007 20:18:28 -0800

> Anton Blanchard wrote:
> > The other option is to create a v3 swap format that doesnt use any
> > PAGE_SIZE parameters.
> 
> The best thing to do would be to look for the magic both at PAGE_SIZE 
> (for compatibility) and MIN_PAGE_SIZE (for sanity.)

That might work, but a large part of me says to go for v3
and do it cleanly.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


NVidia early quirk repair patch bug

2007-03-08 Thread Quinn Storm
In attempting to fix some issues with my system, I was pulling patches from 
the kernel git tree, and I discovered that this patch (
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=fe69933652562f093ccde600cecf234930c01932
 )
is malformed on i386, specifically the line:

+   if (acpi_table_parse(ACPI_SIG_HPET, nvidia_hpet_check) {

should read

+   if (acpi_table_parse(ACPI_SIG_HPET, nvidia_hpet_check)) {

I know someone else would have caught this, but I figured since I was doing 
this I'd post about it.

btw, thanks for the wonderful work guys, wanted to mention bcm43xx finally 
works properly on 2.6.21-rc3 w/ my 4311
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Paul Jackson
> But "namespace" has well-established historical semantics too - a way
> of changing the mappings of local * to global objects. This
> accurately describes things liek resource controllers, cpusets, resource
> monitoring, etc.

No!

Cpusets don't rename or change the mapping of objects.

I suspect you seriously misunderstand cpusets and are trying to cram them
into a 'namespace' remapping role into which they don't fit.

So far as cpusets are concerned, CPU #17 is CPU #17, for all tasks,
regardless of what cpuset they are in.  They just might not happen to
be allowed to execute on CPU #17 at the moment, because that CPU is not
allowed by the cpuset they are in.

But they still call it CPU #17.

Similary the namespace of cpusets and of tasks (pid's) are single
system-wide namespaces, so far as cpusets are concerned.

Cpusets are not about alternative or multiple or variant name spaces.
They are about (considering just CPUs for the moment):
 1) creating a set of maps M0, M1, ... from the set of CPUs to a Boolean,
 2) creating a mapping Q from the set of tasks to these M0, ... maps, and
 3) imposing constraints on where tasks can run, as follows:
For any task t, that task is allowed to run on CPU x iff Q(t)(x)
is True.  Here, Q(t) will be one of the maps M0, ... aka a cpuset.

So far as cpusets are concerned, there is only one each of:
 A] a namespace numbering CPUs,
 B] a namespace numbering tasks (the process id),
 C] a namespace naming cpusets (the hierarchical name space normally
mounted at /dev/cpuset, and corresponding to the Mn maps above) and
 D] a mapping of tasks to cpusets, system wide (just a map, not a namespace.)

All tasks (of sufficient authority) can see each of these, using a single
system wide name space for each of [A], [B], and [C].

Unless, that is, you call any mapping a "way of changing mappings".
To do so would be a senseless abuse of the phrase, in my view.

More generally, these resource managers all tend to divide some external
limited physical resource into multiple separately allocatable units.

If the resource is amorphous (one atom or cycle of it is interchangeable
with another) then we usually do something like divide it into 100
equal units and speak of percentages.  If the resource is naturally
subdivided into sufficiently small units (sufficient for the
granularity of resource management we require) then we take those units
as is.  Occassionally, as in the 'fake numa node' patch by David
Rientjes <[EMAIL PROTECTED]>, who worked at Google over the
last summer, if the natural units are not of sufficient granularity, we
fake up a somewhat finer division.

Then, in any case, and somewhat separately, we divide the tasks running
on the system into subsets.  More precisely, we partition the tasks,
where a partition of a set is a set of subsets of that set, pairwise
disjoint, whose union equals that set.

Then, finally, we map the task subsets (partition element) to the
resource units, and add hooks in the kernel where this particular
resource is allocated or scheduled to constrain the tasks to only using
the units to which their task partition element is mapped.

These hooks are usually the 'interesting' part of a resource management
patch; one needs to minimize impact on both the kernel source code and
on the runtime performance, and for these hooks, that can be a
challenge.  In particular, what are naturally system wide resource
management stuctures cannot be allowed to impose system wide locks on
critical resource allocation code paths (and it's usually the most
critical resources, such as memory, cpu and network, that we most need
to manage in the first place.)

==> This has nothing to do with remapping namespaces as I might use that
phrase though I cannot claim to be qualified enough to speak on behalf
of the Generally Established Principles of Computer Science.

I am as qualified as anyone to speak on behalf of cpusets, and I suspect
you are not accurately understanding them if you think of them as remapping
namespaces.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PAGE_SIZE Availability Inconsistency

2007-03-08 Thread H. Peter Anvin

Anton Blanchard wrote:

Hi,


I might be missing something but doesn't this break every
SWAP partition that was created with something other than
MIN_PAGE_SIZE?


It does. I was thinking we could work around it in ppc64 (64kB is quite
new), but I forgot there are options on sparc64 to change the page size :)

The other option is to create a v3 swap format that doesnt use any
PAGE_SIZE parameters.



The best thing to do would be to look for the magic both at PAGE_SIZE 
(for compatibility) and MIN_PAGE_SIZE (for sanity.)


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler

2007-03-08 Thread Bill Davidsen

Con Kolivas wrote:

On Wednesday 07 March 2007 04:50, Bill Davidsen wrote:



With luck I'll get to shake out that patch in combination with kvm later
today.


Great thanks!. I've appreciated all the feedback so far.

I did try, the 2.6.21-rc3-git3 doesn't want to kvm for me, and your 
patch may not be doing what it should. I'm falling back to 2.6.20 and 
will retest after I document my kvm issues.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] Blackfin: blackfin i2c driver

2007-03-08 Thread Sonic Zhang

On 3/8/07, Jean Delvare <[EMAIL PROTECTED]> wrote:

Hi Brian,

Thanks for the quick update.





> +
> + rc = (iface->result >= 0) ? 0 : -1;
> +
> + /* Release mutex */
> + mutex_unlock(>twi_lock);
> +
> + return rc;
> +}

i2c-core can emulate SMBus transactions using master_xfer, so in
general when you have a complete master_xfer implementation you do not
need to define a separate smbus_xfer function. This would save a lot of
code.



Actually the i2c-core can't emulate SMBus transactions using the
master_xfer function, because the blackfin TWI controller provide
hardware support to the SMBus transactions and the combination of
master_xfer operations can't generate proper signal for SMBus.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler

2007-03-08 Thread Bill Davidsen

Linus Torvalds wrote:


On Mon, 5 Mar 2007, Ed Tomlinson wrote:
The patch _does_ make a difference.  For instance reading mail with freenet working 
hard  (threaded java application) and gentoo's emerge triggering compiles to update the 
box is much smoother.


Think this scheduler needs serious looking at.  


I agree, partly because it's obviously been getting rave reviews so far, 
but mainly because it looks like you can think about behaviour a lot 
better, something that was always very hard with the interactivity 
boosters with process state history.


I'm not at all opposed to this, but we do need:
 - to not do it at this stage in the stable kernel
 - to let it sit in -mm for at least a short while
 - and generally more people testing more loads.

Please, could you now rethink plugable scheduler as well? Even if one 
had to be chosen at boot time and couldn't be change thereafter, it 
would still allow a few new thoughts to be included.


I don't actually worry too much about switching out a CPU scheduler: those 
things are places where you *can* largely read the source code and get an 
idea for them (although with the kind of history state that we currently 
have, it's really really hard). But at the very least they aren't likely 
to have subtle bugs that show up elsewhere, so...


I confess that the default scheduler works for me most of the time, i/o 
tuning is more productive. I want tot test with kvm load, but 
2.6.21-rc3-git3 doesn't want to run kvm at all, I'm looking to see what 
I broke, since nbd doesn't work, either.


I'm collecting OOPS now, will forward when I have a few more.

So as long as the generic concerns above are under control, I'll happily 
try something like this if it can be merged early in a merge window..


Linus



--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Sleeping thread not receive signal until it wakes up

2007-03-08 Thread Parav K Pandit
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Luong Ngo
Sent: Friday, March 09, 2007 8:54 AM
To: Robert Hancock
Cc: linux-kernel; [EMAIL PROTECTED]
Subject: Re: Sleeping thread not receive signal until it wakes up

On 3/8/07, Robert Hancock <[EMAIL PROTECTED]> wrote:
> Luong Ngo wrote:
> > Hi Thomas and Dick,
> > I appreciate all the responses. They are very good information to me.
> > Actually, it wasn't me working on the driver but it's been there long
> > time. I thought I just need to add the signal and signal handling
> > part, not expecting it would lead me to the driver space.
> > Here is what I have in the driver. Maybe racing condition could happen
> > in scenario that the ioctl realease the lock but befor going to sleep,
> > the ISR is invoked and call waking up on the queue, hence the ioctl
> > will not be waken up since the wak up cal already executed. But I
> > believe in our system, this could be tolerant since the hardware would
> > keep raising interrupt if the abnormal condition still exists (Due to
> > the ioctl being blocked so user app nevers get a chance to service the
> > device). But is this the reason why my signal handler not get executed
> > at all? Theoretically, according to the Richard Stevens book, I think
> > the process should be waken up and received the signal even if it gets
> > blocked in the IOCTL call, am i right?
>
> ..
>
> > static int ats89_ioctl(struct inode *inode, struct file *file, u_int
> > cmd, u_long arg)
> > {
> >
> >  switch(cmd){
> >   case GET_IRQ_CMD: {
> >u32  regMask32;
> >
> >   spin_lock_irq(dev->lock);
> >   while ((dev->irqMask & dev->irqEvent) == 0) {
> > // Sleep until board interrupt happens
> > spin_unlock_irq(dev->lock);
> > interruptible_sleep_on(&(dev->boardIRQWaitQueue));
> > if (uncond_wakeup) {
> > /* don't go back to loop */
> > break;
> > }
> > spin_lock_irq(dev->lock);
> > }
>
> Kernel code does not get pre-empted by signals. If the code needs to be
> interruptible by signals this has to be handled explicitly.
> interruptible_sleep on will stop waiting if your task gets a signal, but
> your code doesn't check the signal_pending flag to know whether it
> should exit the loop. If signal_pending(current) is set after the sleep
> you should likely be returning -ERESTARTSYS to allow the task to handle
> the signal. Then after the signal handler from the task returns, the
> ioctl will get called again.
>
> Also, as was pointed out, you should not use the sleep_on family of
> functions, use the wait_event functions intead. sleep_on is racy, if the
> interrupt happened just before you do the sleep, you'll sit there
> waiting for something that already occurred.
>
> --
> Robert Hancock  Saskatoon, SK, Canada
> To email, remove "nospam" from [EMAIL PROTECTED]
> Home Page: http://www.roberthancock.com/
>
>
> Robert, thanks a lot for your suggestion
> But I have added the signal_pending(current) check and signal handler
> is not invoked

>   spin_lock_irq(dev->lock);
>   while ((dev->irqMask & dev->irqEvent) == 0) {
> // Sleep until board interrupt happens
> spin_unlock_irq(dev->lock);
> interruptible_sleep_on(&(dev->boardIRQWaitQueue));
>
> if(signal_pending(current) {
>   return -ERESTARTSYS;
> }
>
> if (uncond_wakeup) {
> /* don't go back to loop */
> break;
> }
> spin_lock_irq(dev->lock);
> }
>Still no luck yet.
>LNgo
>-

I guess you need to call allow_signal(xxx) before you go for sleep.
Parav

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


DISCLAIMER:
This message (including attachment if any) is confidential and may be 
privileged. Before opening attachments please check them for viruses and 
defects. MindTree Consulting Limited (MindTree) will not be responsible for any 
viruses or defects or any forwarded attachments emanating either from within 
MindTree or outside. If you have received this message by mistake please notify 
the sender by return  e-mail and delete this message from your system. Any 
unauthorized use or dissemination of this message in whole or in part is 
strictly prohibited.  Please note that e-mails are susceptible to change and 
MindTree shall not be liable for any improper, untimely or incomplete 
transmission.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  

Re: any thoughts yet on a "generic" ioctl.h?

2007-03-08 Thread Stefan Richter
Robert P. J. Day wrote:
>   i asked about this a while back, but i still haven't heard a
> definitive response as to whether it's acceptable.

Maybe you get response if you post a complete patch.
-- 
Stefan Richter
-=-=-=== --== -=--=
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH]RTC: add rtc-rs5c313 driver

2007-03-08 Thread Nobuhiro Iwamatsu
Hi , all .

Add an RTC driver for Ricoh RS5C313 RTC chip.

Please apply.

regards, 
 Nobuhiro

Signed-off-by: Nobuhiro Iwamatsu <[EMAIL PROTECTED]>

diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig
index 95826b9..cc3c0b2 100644
--- a/drivers/rtc/Kconfig
+++ b/drivers/rtc/Kconfig
@@ -354,4 +354,10 @@ config RTC_DRV_V3020
  This driver can also be built as a module. If so, the module
  will be called rtc-v3020.
 
+config RTC_DRV_RS5C313
+   tristate "Ricoh RS5C313"
+   depends on RTC_CLASS
+   help
+ If you say yes here you get support for the Ricoh RS5C313 RTC chips.
+
 endmenu
diff --git a/drivers/rtc/Makefile b/drivers/rtc/Makefile
index 92bfe1b..9af3129 100644
--- a/drivers/rtc/Makefile
+++ b/drivers/rtc/Makefile
@@ -38,3 +38,4 @@ obj-$(CONFIG_RTC_DRV_MAX6902) += rtc-max6902.o
 obj-$(CONFIG_RTC_DRV_V3020)+= rtc-v3020.o
 obj-$(CONFIG_RTC_DRV_AT91RM9200)+= rtc-at91rm9200.o
 obj-$(CONFIG_RTC_DRV_SH)   += rtc-sh.o
+obj-$(CONFIG_RTC_DRV_RS5C313)  += rtc-rs5c313.o
diff --git a/drivers/rtc/rtc-rs5c313.c b/drivers/rtc/rtc-rs5c313.c
new file mode 100644
index 000..8751d2b
--- /dev/null
+++ b/drivers/rtc/rtc-rs5c313.c
@@ -0,0 +1,408 @@
+/*
+ * Ricoh RS5C313 RTC device/driver
+ *  Copyright (C) 2007 Nobuhiro Iwamatsu
+ *
+ *  2005-09-19 modifed by kogiidena
+ *
+ * Based on the old drivers/char/rs5c313_rtc.c  by:
+ *  Copyright (C) 2000 Philipp Rumpf <[EMAIL PROTECTED]>
+ *  Copyright (C) 1999 Tetsuya Okada & Niibe Yutaka
+ *
+ * Based on code written by Paul Gortmaker.
+ *  Copyright (C) 1996 Paul Gortmaker
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Based on other minimal char device drivers, like Alan's
+ * watchdog, Ted's random, etc. etc.
+ *
+ * 1.07Paul Gortmaker.
+ * 1.08Miquel van Smoorenburg: disallow certain things on the
+ * DEC Alpha as the CMOS clock is also used for other things.
+ * 1.09Nikita Schmidt: epoch support and some Alpha cleanup.
+ * 1.09a   Pete Zaitcev: Sun SPARC
+ * 1.09b   Jeff Garzik: Modularize, init cleanup
+ * 1.09c   Jeff Garzik: SMP cleanup
+ * 1.10Paul Barton-Davis: add support for async I/O
+ * 1.10a   Andrea Arcangeli: Alpha updates
+ * 1.10b   Andrew Morton: SMP lock fix
+ * 1.10c   Cesar Barros: SMP locking fixes and cleanup
+ * 1.10d   Paul Gortmaker: delete paranoia check in rtc_exit
+ * 1.10e   Maciej W. Rozycki: Handle DECstation's year weirdness.
+ *  1.11Takashi Iwai: Kernel access functions
+ *   rtc_register/rtc_unregister/rtc_control
+ *  1.11a   Daniele Bellucci: Audit create_proc_read_entry in rtc_init
+ * 1.12Venkatesh Pallipadi: Hooks for emulating rtc on HPET base-timer
+ * CONFIG_HPET_EMULATE_RTC
+ * 1.13Nobuhiro Iwamatsu: Updata driver.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRV_NAME   "rs5c313"
+#define DRV_VERSION"1.13"
+
+#ifdef CONFIG_SH_LANDISK
+/*/
+/* LANDISK dependence part of RS5C313*/
+/*/
+
+#define SCSMR1 0xFFE0
+#define SCSCR1 0xFFE8
+#define SCSMR1_CA  0x80
+#define SCSCR1_CKE 0x03
+#define SCSPTR10xFFE0001C
+#define SCSPTR1_EIO0x80
+#define SCSPTR1_SPB1IO 0x08
+#define SCSPTR1_SPB1DT 0x04
+#define SCSPTR1_SPB0IO 0x02
+#define SCSPTR1_SPB0DT 0x01
+
+#define SDA_OENSCSPTR1_SPB1IO
+#define SDASCSPTR1_SPB1DT
+#define SCL_OENSCSPTR1_SPB0IO
+#define SCLSCSPTR1_SPB0DT
+
+/* RICOH RS5C313 CE port */
+#define RS5C313_CE 0xB003
+
+/* RICOH RS5C313 CE port bit */
+#define RS5C313_CE_RTCCE   0x02
+
+/* SCSPTR1 data */
+unsigned char scsptr1_data;
+
+#define RS5C313_CEENABLEctrl_outb(RS5C313_CE_RTCCE, RS5C313_CE);
+#define RS5C313_CEDISABLE   ctrl_outb(0x00, RS5C313_CE)
+#define RS5C313_MISCOP  ctrl_outb(0x02, 0xB008)
+
+static void rs5c313_init_port(void)
+{
+   /* Set SCK as I/O port and Initialize SCSPTR1 data & I/O port. */
+   ctrl_outb(ctrl_inb(SCSMR1) & ~SCSMR1_CA, SCSMR1);
+   ctrl_outb(ctrl_inb(SCSCR1) & ~SCSCR1_CKE, SCSCR1);
+
+   /* And Initialize SCL for RS5C313 clock */
+   scsptr1_data = ctrl_inb(SCSPTR1) | SCL; /* SCL:H */
+   ctrl_outb(scsptr1_data, SCSPTR1);
+   scsptr1_data = ctrl_inb(SCSPTR1) | SCL_OEN; /* SCL output enable */
+   ctrl_outb(scsptr1_data, SCSPTR1);
+   RS5C313_CEDISABLE;  /* CE:L */
+}
+
+static void rs5c313_write_data(unsigned char data)
+{
+   int i = 0;
+
+   for(i = 0; i < 8; i++){
+   /* SDA:Write Data */
+   scsptr1_data = (scsptr1_data & ~SDA)
+   | 0x80 >> i) & data) >> (7 - i)) 

Re: [PATCH] Complain about missing system calls.

2007-03-08 Thread Anton Blanchard

Hi,

> Most system calls seem to get added to i386 first. This patch
> automatically generates a warning for any new system call which is
> implemented on i386 but not the architecture currently being compiled.
> On PowerPC at the moment, for example, it results in these warnings:

Love it!

...

> Thanks for the update. Quite why the PowerPC kernel defines system
> call numbers for all of these I have no idea :)

BTW while unistd.h may refer to sys_iopl etc, the actual syscall is sent
to ni_syscall in most of these useless cases.

Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 9/9] lguest: don't crash host on NMI

2007-03-08 Thread Rusty Russell
"handle" NMI by ignoring it.  Can't have been important, right?  As the
lguest64 hackers explained, handling NMI is a PITA.  Now oprofile does
not crash machine.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r 5beeb29ed3a3 arch/i386/lguest/hypervisor.S
--- a/arch/i386/lguest/hypervisor.S Wed Feb 28 09:59:23 2007 +1100
+++ b/arch/i386/lguest/hypervisor.S Wed Feb 28 09:59:47 2007 +1100
@@ -116,6 +116,11 @@ deliver_to_host:
orl %eax, %edx
jmp *%edx
 
+/* We ignore NMI and return. */
+handle_nmi:
+   addl$8, %esp
+   iret
+
 /* Real hardware interrupts are delivered straight to the host.  Others
cause us to return to run_guest_once so it can decide what to do.  Note
that some of these are overridden by the guest to deliver directly, and
@@ -148,8 +153,7 @@ default_idt_entries:
 default_idt_entries:
 .text
IRQ_STUBS 0 1 return_to_host/* First two traps */
-/* FIXME: NMI needs something completely different.  Don't SWITCH_TO_HOST. */
-   IRQ_STUB 2 deliver_to_host  /* NMI */
+   IRQ_STUB 2 handle_nmi   /* NMI */
IRQ_STUBS 3 31 return_to_host   /* Rest of traps */
IRQ_STUBS 32 127 deliver_to_host/* Real interrupts */
IRQ_STUB 128 return_to_host /* System call (overridden) */


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/9] lguest: Optimize away copy in and out of per-cpu guest pages

2007-03-08 Thread Rusty Russell
Rather than copy in IDT, GDT and TSS every time, we only need do it
when something has changed (ie. guest IDT/GDT/TSS has changed, or
guest has changed CPU, or CPU has just run another guest).

For the registers, we simply allocate them an entire page and map that
over the stack page in the guest.

This restores context switch speed to be comparable to the old
segment-using lguest.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r 8286b7923a5b arch/i386/lguest/core.c
--- a/arch/i386/lguest/core.c   Fri Mar 09 13:09:39 2007 +1100
+++ b/arch/i386/lguest/core.c   Fri Mar 09 13:09:48 2007 +1100
@@ -37,6 +37,7 @@ static struct {
unsigned short segment;
 } lguest_entry __attribute_used__;
 DEFINE_MUTEX(lguest_lock);
+static DEFINE_PER_CPU(struct lguest *, last_guest);
 
 /* FIXME: Make dynamic. */
 #define MAX_LGUEST_GUESTS 16
@@ -144,10 +145,10 @@ static int emulate_insn(struct lguest *l
 {
u8 insn;
unsigned int insnlen = 0, in = 0, shift = 0;
-   unsigned long physaddr = guest_pa(lg, lg->regs.eip);
+   unsigned long physaddr = guest_pa(lg, lg->regs->eip);
 
/* This only works for addresses in linear mapping... */
-   if (lg->regs.eip < lg->page_offset)
+   if (lg->regs->eip < lg->page_offset)
return 0;
lhread(lg, , physaddr, 1);
 
@@ -180,11 +181,11 @@ static int emulate_insn(struct lguest *l
if (in) {
/* Lower bit tells is whether it's a 16 or 32 bit access */
if (insn & 0x1)
-   lg->regs.eax = 0x;
+   lg->regs->eax = 0x;
else
-   lg->regs.eax |= (0x << shift);
-   }
-   lg->regs.eip += insnlen;
+   lg->regs->eax |= (0x << shift);
+   }
+   lg->regs->eip += insnlen;
return 1;
 }
 
@@ -260,36 +261,35 @@ static void run_guest_once(struct lguest
 : "memory", "%edx", "%ecx", "%edi", "%esi");
 }
 
-static void copy_in_guest_info(struct lguest_pages *pages,
-  struct lguest *lg)
-{
-   /* Copy in regs. */
-   pages->regs = lg->regs;
-
-   /* TSS entries for direct traps. */
+static void copy_in_guest_info(struct lguest_pages *pages, struct lguest *lg)
+{
+   if (__get_cpu_var(last_guest) != lg || lg->last_pages != pages) {
+   __get_cpu_var(last_guest) = lg;
+   lg->last_pages = pages;
+   lg->changed = CHANGED_ALL;
+   }
+
+   /* These are pretty cheap, so we do them unconditionally. */
+   pages->state.host_cr3 = __pa(current->mm->pgd);
+   map_hypervisor_in_guest(lg, pages);
pages->state.guest_tss.esp1 = lg->esp1;
pages->state.guest_tss.ss1 = lg->ss1;
 
-   /* CR3 */
-   pages->state.host_cr3 = __pa(current->mm->pgd);
-
/* Copy direct trap entries. */
-   copy_traps(lg, pages->state.guest_idt, lguest_default_idt_entries());
+   if (lg->changed & CHANGED_IDT)
+   copy_traps(lg, pages->state.guest_idt,
+  lguest_default_idt_entries());
 
/* Copy all GDT entries but the TSS. */
-   copy_gdt(lg, pages->state.guest_gdt);
-}
-
-static void copy_out_guest_info(struct lguest *lg,
-   const struct lguest_pages *pages)
-{
-   /* We just want the regs back. */
-   lg->regs = pages->regs;
+   if (lg->changed & CHANGED_GDT)
+   copy_gdt(lg, pages->state.guest_gdt);
+
+   lg->changed = 0;
 }
 
 int run_guest(struct lguest *lg, char *__user user)
 {
-   struct lguest_regs *regs = >regs;
+   struct lguest_regs *regs = lg->regs;
 
while (!lg->dead) {
unsigned int cr2 = 0; /* Damn gcc */
@@ -327,10 +327,8 @@ int run_guest(struct lguest *lg, char *_
set_ts(lg->ts);
 
pages = lguest_pages(raw_smp_processor_id());
-   map_hypervisor_in_guest(lg);
copy_in_guest_info(pages, lg);
run_guest_once(lg, pages);
-   copy_out_guest_info(lg, pages);
 
/* Save cr2 now if we page-faulted. */
if (regs->trapnum == 14)
diff -r 8286b7923a5b arch/i386/lguest/hypervisor.S
--- a/arch/i386/lguest/hypervisor.S Fri Mar 09 13:09:39 2007 +1100
+++ b/arch/i386/lguest/hypervisor.S Fri Mar 09 13:15:43 2007 +1100
@@ -76,6 +76,8 @@ switch_to_guest:
/* Figure out where we are, based on stack (at top of regs). */ \
movl%esp, %eax; \
subl$LGUEST_PAGES_regs, %eax;   \
+   /* Put trap number in %ebx before we switch cr3 and lose it. */ \
+   movlLGUEST_PAGES_regs_trapnum(%eax), %ebx;  \
/* Switch to host page tables (host GDT, IDT and stack are in host   \
   mem, so need this first) */  \
movl

Re: [RFC] [Patch 1/1] IBAC Patch

2007-03-08 Thread Valdis . Kletnieks
On Thu, 08 Mar 2007 17:58:16 EST, Mimi Zohar said:
> This is a request for comments for a new Integrity Based Access
> Control(IBAC) LSM module which bases access control decisions
> on the new integrity framework services. 
> 
> (Hopefully this will help clarify the interaction between an LSM 
> module and LIM module.)

OK, between this and the additional LIM hooks I didn't notice in an earlier
patch, we're starting to see the API.   The only problem is that although
it may be the right API for *your* code, I suspect it's a non-starter without
a discussion about whether it's the right *generic* API for an LIM (which will
require at least one dramatic bun fight about what "Integrity" means).

> Index: linux-2.6.21-rc3-mm2/security/ibac/Kconfig
 
Minor congnitive-dissonance alert:

> +config SECURITY_IBAC_BOOTPARAM
> + bool "IBAC boot parameter"
> + depends on SECURITY_IBAC
> + default y

> +   If you are unsure how to answer this question, answer N.

The 'default' should in general match the hint we give the user.


pgpTmJITtijDg.pgp
Description: PGP signature


Re: Sleeping thread not receive signal until it wakes up

2007-03-08 Thread Luong Ngo

On 3/8/07, Robert Hancock <[EMAIL PROTECTED]> wrote:

Luong Ngo wrote:
> Hi Thomas and Dick,
> I appreciate all the responses. They are very good information to me.
> Actually, it wasn't me working on the driver but it's been there long
> time. I thought I just need to add the signal and signal handling
> part, not expecting it would lead me to the driver space.
> Here is what I have in the driver. Maybe racing condition could happen
> in scenario that the ioctl realease the lock but befor going to sleep,
> the ISR is invoked and call waking up on the queue, hence the ioctl
> will not be waken up since the wak up cal already executed. But I
> believe in our system, this could be tolerant since the hardware would
> keep raising interrupt if the abnormal condition still exists (Due to
> the ioctl being blocked so user app nevers get a chance to service the
> device). But is this the reason why my signal handler not get executed
> at all? Theoretically, according to the Richard Stevens book, I think
> the process should be waken up and received the signal even if it gets
> blocked in the IOCTL call, am i right?

..

> static int ats89_ioctl(struct inode *inode, struct file *file, u_int
> cmd, u_long arg)
> {
>
>  switch(cmd){
>   case GET_IRQ_CMD: {
>u32  regMask32;
>
>   spin_lock_irq(dev->lock);
>   while ((dev->irqMask & dev->irqEvent) == 0) {
> // Sleep until board interrupt happens
> spin_unlock_irq(dev->lock);
> interruptible_sleep_on(&(dev->boardIRQWaitQueue));
> if (uncond_wakeup) {
> /* don't go back to loop */
> break;
> }
> spin_lock_irq(dev->lock);
> }

Kernel code does not get pre-empted by signals. If the code needs to be
interruptible by signals this has to be handled explicitly.
interruptible_sleep on will stop waiting if your task gets a signal, but
your code doesn't check the signal_pending flag to know whether it
should exit the loop. If signal_pending(current) is set after the sleep
you should likely be returning -ERESTARTSYS to allow the task to handle
the signal. Then after the signal handler from the task returns, the
ioctl will get called again.

Also, as was pointed out, you should not use the sleep_on family of
functions, use the wait_event functions intead. sleep_on is racy, if the
interrupt happened just before you do the sleep, you'll sit there
waiting for something that already occurred.

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/



Robert, thanks a lot for your suggestion
But I have added the signal_pending(current) check and signal handler
is not invoked

  spin_lock_irq(dev->lock);
  while ((dev->irqMask & dev->irqEvent) == 0) {
// Sleep until board interrupt happens
spin_unlock_irq(dev->lock);
interruptible_sleep_on(&(dev->boardIRQWaitQueue));

if(signal_pending(current) {
  return -ERESTARTSYS;
}

if (uncond_wakeup) {
/* don't go back to loop */
break;
}
spin_lock_irq(dev->lock);
}
Still no luck yet.
LNgo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/9] lguest: use read-only pages rather than segments to protect high-mapped switcher

2007-03-08 Thread Rusty Russell
The current lguest uses segment limits to ensure that the guest cannot
reach the switcher code at the top of virtual memory.  This is bad for
two reasons:

1) It introduces complexity when the guest wants to use 4G segments
(ie. glibc's __thread support).
2) It doesn't work on x86-64 boxes.

The alternative is used here: in the host we map the actual switcher
code, two per-cpu pages.  The switcher code and one per-cpu page are
read-only: the read-only page contains the saved host state and the
GDT, IDT and TSS the guest is using.  The other per-cpu page is the
stack page for the hypervisor, which is writable by the guest.  This
is where we save the guest registers: it's safe because while we're
doing this we know the (UP) guest isn't running.

Switching into the guest involves copying in the registers, GDT and
IDT to this cpu's pages, the copying the registers out on the way
back.  This is optimized in another patch.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r 7a963f6eef0a arch/i386/kernel/asm-offsets.c
--- a/arch/i386/kernel/asm-offsets.cThu Mar 08 17:01:08 2007 +1100
+++ b/arch/i386/kernel/asm-offsets.cThu Mar 08 17:21:16 2007 +1100
@@ -122,15 +122,15 @@ void foo(void)
 #ifdef CONFIG_LGUEST_GUEST
BLANK();
OFFSET(LGUEST_DATA_irq_enabled, lguest_data, irq_enabled);
-   OFFSET(LGUEST_STATE_host_stackptr, lguest_state, host.stackptr);
-   OFFSET(LGUEST_STATE_host_pgdir, lguest_state, host.pgdir);
-   OFFSET(LGUEST_STATE_host_gdt, lguest_state, host.gdt);
-   OFFSET(LGUEST_STATE_host_idt, lguest_state, host.idt);
-   OFFSET(LGUEST_STATE_regs, lguest_state, regs);
-   OFFSET(LGUEST_STATE_gdt, lguest_state, gdt);
-   OFFSET(LGUEST_STATE_idt, lguest_state, idt);
-   OFFSET(LGUEST_STATE_gdt_table, lguest_state, gdt_table);
-   OFFSET(LGUEST_STATE_trapnum, lguest_state, regs.trapnum);
-   OFFSET(LGUEST_STATE_errcode, lguest_state, regs.errcode);
+   OFFSET(LGUEST_PAGES_host_gdt_desc, lguest_pages, state.host_gdt_desc);
+   OFFSET(LGUEST_PAGES_host_idt_desc, lguest_pages, state.host_idt_desc);
+   OFFSET(LGUEST_PAGES_host_cr3, lguest_pages, state.host_cr3);
+   OFFSET(LGUEST_PAGES_host_sp, lguest_pages, state.host_sp);
+   OFFSET(LGUEST_PAGES_guest_gdt_desc, lguest_pages,state.guest_gdt_desc);
+   OFFSET(LGUEST_PAGES_guest_idt_desc, lguest_pages,state.guest_idt_desc);
+   OFFSET(LGUEST_PAGES_guest_gdt, lguest_pages, state.guest_gdt);
+   OFFSET(LGUEST_PAGES_regs_trapnum, lguest_pages, regs.trapnum);
+   OFFSET(LGUEST_PAGES_regs_errcode, lguest_pages, regs.errcode);
+   OFFSET(LGUEST_PAGES_regs, lguest_pages, regs);
 #endif
 }
diff -r 7a963f6eef0a arch/i386/lguest/Makefile
--- a/arch/i386/lguest/Makefile Thu Mar 08 17:01:08 2007 +1100
+++ b/arch/i386/lguest/Makefile Thu Mar 08 17:21:16 2007 +1100
@@ -6,8 +6,8 @@ lg-objs := core.o hypercalls.o page_tabl
 lg-objs := core.o hypercalls.o page_tables.o interrupts_and_traps.o \
segments.o io.o lguest_user.o
 
-# We use top 4MB for guest traps page, then hypervisor. */
-HYPE_ADDR := (0xFFC0+4096)
+# We use top 4MB for hypervisor. */
+HYPE_ADDR := 0xFFC0
 # The data is only 1k (256 interrupt handler pointers)
 HYPE_DATA_SIZE := 1024
 CFLAGS += -DHYPE_ADDR="$(HYPE_ADDR)" -DHYPE_DATA_SIZE="$(HYPE_DATA_SIZE)"
diff -r 7a963f6eef0a arch/i386/lguest/core.c
--- a/arch/i386/lguest/core.c   Thu Mar 08 17:01:08 2007 +1100
+++ b/arch/i386/lguest/core.c   Fri Mar 09 13:09:27 2007 +1100
@@ -24,26 +24,26 @@ static char __initdata hypervisor_blob[]
 #include "hypervisor-blob.c"
 };
 
-/* 64k ought to be enough for anybody! */
-#define HYPERVISOR_PAGES (65536 / PAGE_SIZE)
-
-#define MAX_LGUEST_GUESTS  \
-   (((HYPERVISOR_PAGES * PAGE_SIZE) - sizeof(hypervisor_blob)) \
-/ sizeof(struct lguest_state))
+/* Every guest maps the core hypervisor blob. */
+#define SHARED_HYPERVISOR_PAGES DIV_ROUND_UP(sizeof(hypervisor_blob),PAGE_SIZE)
 
 static struct vm_struct *hypervisor_vma;
-/* Pages for hypervisor itself */
-static struct page *hype_page[HYPERVISOR_PAGES];
+/* Pages for hypervisor itself, then two pages per cpu */
+static struct page *hype_page[SHARED_HYPERVISOR_PAGES+2*NR_CPUS];
+
 static int cpu_had_pge;
 static struct {
unsigned long offset;
unsigned short segment;
 } lguest_entry __attribute_used__;
+DEFINE_MUTEX(lguest_lock);
+
+/* FIXME: Make dynamic. */
+#define MAX_LGUEST_GUESTS 16
 struct lguest lguests[MAX_LGUEST_GUESTS];
-DEFINE_MUTEX(lguest_lock);
 
 /* IDT entries are at start of hypervisor. */
-const unsigned long *__lguest_default_idt_entries(void)
+static const unsigned long *lguest_default_idt_entries(void)
 {
return (void *)HYPE_ADDR;
 }
@@ -54,10 +54,11 @@ static void *__lguest_switch_to_guest(vo
return (void *)HYPE_ADDR + HYPE_DATA_SIZE;
 }
 
-/* Then we use everything else to hold guest state. */
-struct lguest_state *__lguest_states(void)
-{
-

Re: KVM and rtc missing interupts 2.6.21-rc3

2007-03-08 Thread Roland Dreier
 > > Try running your guest with -no-kvm (and even with the kvm module not
 > > loaded, just to be sure).  In my case I still saw the messages.
 > > However, removing the "-net tap" line from my command line did get rid
 > > of the messages.
 > 
 > Hmmm I'm getting the same thing on 2.6.20.1 as well.

With or without kvm?

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KVM and rtc missing interupts 2.6.21-rc3

2007-03-08 Thread Roland Dreier
 > Holding a lock too long should be easy to debug with the -rt patch -
 > set the preemption related options to match mainline, and enable the
 > latency tracer.

Good idea, I'll give it a shot when I get a chance.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/9] lguest: pin stack page optimization

2007-03-08 Thread Rusty Russell
We make sure that the stack is always mapped in pin_stack_pages by
simply calling demand_page, but that calls get_user_pages() to find
the pfn, which is way overkill since the page is almost certainly
already mapped.  So don't call pin_stack_pages every context switch
(unless genuinely a completely clean context, all the kernel mappings
are kept in sync), and when we do call it, have it check if it needs
to call demand_page().

This speeds guest context switch by 25%:

Before:
Time for one context switch via pipe: 10606 nsec
After:
Time for one context switch via pipe: 7805 nsec
Native:
Time for one context switch via pipe: 4701 nsec

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r 06b3a533da77 arch/i386/lguest/page_tables.c
--- a/arch/i386/lguest/page_tables.cWed Feb 21 12:20:20 2007 +1100
+++ b/arch/i386/lguest/page_tables.cWed Feb 21 18:13:00 2007 +1100
@@ -155,14 +158,29 @@ int demand_page(struct lguest *lg, u32 v
return page_in(lg, vaddr, (write ? _PAGE_DIRTY : 0)|_PAGE_ACCESSED);
 }
 
+/* This is much faster than the full demand_page logic. */
+static int page_writable(struct lguest *lg, unsigned long vaddr)
+{
+   u32 *top, *pte;
+
+   top = toplev(lg, lg->pgdidx, vaddr);
+   if (!(*top & _PAGE_PRESENT))
+   return 0;
+
+   pte = pteof(lg, *top, vaddr);
+   return (*pte & (_PAGE_PRESENT|_PAGE_RW)) == (_PAGE_PRESENT|_PAGE_RW);
+}
+
 void pin_stack_pages(struct lguest *lg)
 {
unsigned int i;
u32 stack = lg->state->tss.esp1;
 
-   for (i = 0; i < lg->stack_pages; i++)
-   if (!demand_page(lg, stack - i*PAGE_SIZE, 1))
+   for (i = 0; i < lg->stack_pages; i++) {
+   if (!page_writable(lg, stack - i * PAGE_SIZE)
+   && !demand_page(lg, stack - i * PAGE_SIZE, 1))
kill_guest(lg, "bad stack page [EMAIL PROTECTED]", i, 
stack);
+   }
 }
 
 static unsigned int find_pgdir(struct lguest *lg, u32 pgtable)
@@ -198,7 +216,7 @@ void guest_pagetable_flush_user(struct l
flush_user_mappings(lg, lg->pgdidx);
 }
 
-static unsigned int new_pgdir(struct lguest *lg, u32 cr3)
+static unsigned int new_pgdir(struct lguest *lg, u32 cr3, int *blank_pgdir)
 {
unsigned int next;
 
@@ -207,6 +225,9 @@ static unsigned int new_pgdir(struct lgu
lg->pgdirs[next].pgdir = (u32 *)get_zeroed_page(GFP_KERNEL);
if (!lg->pgdirs[next].pgdir)
next = lg->pgdidx;
+   else
+   /* There are no mappings: you'll need to re-pin */
+   *blank_pgdir = 1;
}
lg->pgdirs[next].cr3 = cr3;
/* Release all the non-kernel mappings. */
@@ -217,14 +238,15 @@ static unsigned int new_pgdir(struct lgu
 
 void guest_new_pagetable(struct lguest *lg, u32 pgtable)
 {
-   int newpgdir;
+   int newpgdir, repin = 0;
 
newpgdir = find_pgdir(lg, pgtable);
if (newpgdir == ARRAY_SIZE(lg->pgdirs))
-   newpgdir = new_pgdir(lg, pgtable);
+   newpgdir = new_pgdir(lg, pgtable, );
lg->pgdidx = newpgdir;
lg->cr3 = __pa(lg->pgdirs[lg->pgdidx].pgdir);
-   pin_stack_pages(lg);
+   if (repin)
+   pin_stack_pages(lg);
 }
 
 static void release_all_pagetables(struct lguest *lg)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Fix avr32 TIF atomicity in do_debug_priv

2007-03-08 Thread Mathieu Desnoyers
Fix avr32 TIF atomicity in do_debug_priv

avr32 updates the thread flags 1 - non atomically and 2 - with the wrong value
(for TIF_SINGLE_STEP) in this function.

It applies to 2.6.20.

Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>

--- a/arch/avr32/kernel/ptrace.c
+++ b/arch/avr32/kernel/ptrace.c
@@ -306,14 +306,12 @@ asmlinkage void do_debug_priv(struct pt_regs *regs)
if (likely(ds & DS_SSS)) {
extern void itlb_miss(void);
extern void tlb_miss_common(void);
-   struct thread_info *ti;
 
dc = __mfdr(DBGREG_DC);
dc &= ~DC_SS;
__mtdr(DBGREG_DC, dc);
 
-   ti = current_thread_info();
-   ti->flags |= _TIF_BREAKPOINT;
+   set_tsk_thread_flag(tsk, TIF_BREAKPOINT);
 
/* The TLB miss handlers don't check thread flags */
if ((regs->pc >= (unsigned long)_miss)
@@ -328,7 +326,7 @@ asmlinkage void do_debug_priv(struct pt_regs *regs)
 * single step.
 */
if ((regs->sr & MODE_MASK) != MODE_SUPERVISOR)
-   ti->flags |= TIF_SINGLE_STEP;
+   set_tsk_thread_flag(tsk, TIF_SINGLE_STEP);
} else {
panic("Unable to handle debug trap at pc = %08lx\n",
  regs->pc);
-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/9] lguest: documentation fixes

2007-03-08 Thread Rusty Russell
1: It helps if you connect the bridge to a link.

Signed-off-by: James Morris <[EMAIL PROTECTED]>

2: You can theoretically run lguest with no boot parameters.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r 90134cf1fe0a Documentation/lguest/lguest.c
--- a/Documentation/lguest/lguest.c Wed Feb 21 11:32:59 2007 +1100
+++ b/Documentation/lguest/lguest.c Wed Feb 21 11:51:47 2007 +1100
@@ -938,7 +938,7 @@ int main(int argc, char *argv[])
argc--;
}
 
-   if (argc < 4)
+   if (argc < 3)
errx(1, "Usage: lguest [--verbose]  vmlinux "

"[--sharenet=|--tunnet=(|bridge:)"
"|--block=|--initrd=]... 
[args...]");
diff -r 90134cf1fe0a Documentation/lguest/lguest.txt
--- a/Documentation/lguest/lguest.txt   Wed Feb 21 11:32:59 2007 +1100
+++ b/Documentation/lguest/lguest.txt   Wed Feb 21 11:50:55 2007 +1100
@@ -90,6 +90,7 @@ Running Lguest:
 ifconfig eth0 0.0.0.0
 brctl addbr lg0
 ifconfig lg0 up
+brctl addif lg0 eth0
 dhclient lg0
 
   Then use --tunnet=bridge:lg0 when launching the guest.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/9] lguest: cleanup: clean up regs save/restore

2007-03-08 Thread Rusty Russell
We previously put "cr3" in the guest regs restored and saved: the
guest cannot change cr3, so saving it it silly.  Hand it across to the
host<->guest switcher in ebx.

While we're there, only save the host registers we need to; tell GCC
we clobber everything we can.

Finally, and trap 2 (NMI) doesn't supply a error code (we don't handle
NMI yet, but the test is wrong, so fix it before I get confused).

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r 6efda2f8ac22 arch/i386/lguest/core.c
--- a/arch/i386/lguest/core.c   Thu Mar 08 16:25:07 2007 +1100
+++ b/arch/i386/lguest/core.c   Thu Mar 08 16:51:17 2007 +1100
@@ -260,11 +260,11 @@ static void run_guest_once(struct lguest
 {
unsigned int clobber;
 
-   /* Put eflags on stack, lcall does rest. */
+   /* Put eflags on stack, lcall does rest: suitable for iret return. */
asm volatile("pushf; lcall *lguest_entry"
-: "=a"(clobber), "=d"(clobber)
-: "0"(lg->state), "1"(get_idt_table())
-: "memory");
+: "=a"(clobber), "=d"(clobber), "=b"(clobber)
+: "0"(lg->state), "1"(get_idt_table()), "2"(lg->cr3)
+: "memory", "%ecx", "%edi", "%esi");
 }
 
 int run_guest(struct lguest *lg, char *__user user)
diff -r 6efda2f8ac22 arch/i386/lguest/hypervisor.S
--- a/arch/i386/lguest/hypervisor.S Thu Mar 08 16:25:07 2007 +1100
+++ b/arch/i386/lguest/hypervisor.S Thu Mar 08 16:52:56 2007 +1100
@@ -4,26 +4,17 @@
 #include 
 #include "lg.h"
 
-#define SAVE_REGS  \
-   /* Save old guest/host state */ \
-   pushl   %es;\
-   pushl   %ds;\
-   pushl   %fs;\
-   pushl   %eax;   \
-   pushl   %gs;\
-   pushl   %ebp;   \
-   pushl   %edi;   \
-   pushl   %esi;   \
-   pushl   %edx;   \
-   pushl   %ecx;   \
-   pushl   %ebx;   \
-
 .text
 ENTRY(_start) /* ld complains unless _start is defined. */
-/* %eax contains ptr to target guest state, %edx contains host idt. */
+/* %eax contains ptr to target guest state, %edx contains host idt.
+   %ebx contains cr3 value.  All normal registers can be clobbered! */
 switch_to_guest:
-   pushl   %ss
-   SAVE_REGS
+   pushl   %es
+   pushl   %ds
+   pushl   %fs
+   pushl   %gs
+   pushl   %edx
+   pushl   %ebp
/* Save old stack, switch to guest's stack. */
movl%esp, LGUEST_STATE_host_stackptr(%eax)
movl%eax, %esp
@@ -33,17 +24,16 @@ switch_to_guest:
lgdtLGUEST_STATE_gdt(%eax)
lidtLGUEST_STATE_idt(%eax)
/* Save page table top. */
-   movl%cr3, %ebx
-   movl%ebx, LGUEST_STATE_host_pgdir(%eax)
+   movl%cr3, %ecx
+   movl%ecx, LGUEST_STATE_host_pgdir(%eax)
/* Set host's TSS to available (clear byte 5 bit 2). */
-   movl(LGUEST_STATE_host_gdt+2)(%eax), %ebx
-   andb$0xFD, (GDT_ENTRY_TSS*8 + 5)(%ebx)
+   movl(LGUEST_STATE_host_gdt+2)(%eax), %ecx
+   andb$0xFD, (GDT_ENTRY_TSS*8 + 5)(%ecx)
/* Switch to guest page tables */
-   popl%ebx
movl%ebx, %cr3
/* Switch to guest's TSS. */
-   movl$(GDT_ENTRY_TSS*8), %ebx
-   ltr %bx
+   movl$(GDT_ENTRY_TSS*8), %edx
+   ltr %dx
/* Restore guest regs */
popl%ebx
popl%ecx
@@ -66,10 +56,18 @@ switch_to_guest:
iret
 
 #define SWITCH_TO_HOST \
-   SAVE_REGS;  \
-   /* Save old pgdir */\
-   movl%cr3, %eax; \
+   /* Save guest state */  \
+   pushl   %es;\
+   pushl   %ds;\
+   pushl   %fs;\
pushl   %eax;   \
+   pushl   %gs;\
+   pushl   %ebp;   \
+   pushl   %edi;   \
+   pushl   %esi;   \
+   pushl   %edx;   \
+   pushl   %ecx;   \
+   pushl   %ebx;   \
/* Load lguest ds segment for convenience. */

[PATCH 3/9] lguest: cleanup: allocate separate pages for switcher code

2007-03-08 Thread Rusty Russell
We don't need physically-contiguous pages for the hypervisor, since we
use map_vm_area anyway.

Two other related cleanups: pass the number of pages to
init_pagetables() so we can remove the constant from the header, and
call populate_hypervisor_pte_page() on each page as we allocate it,
rather than as a separate loop.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r 9fea34a28460 arch/i386/lguest/core.c
--- a/arch/i386/lguest/core.c   Thu Mar 08 16:09:00 2007 +1100
+++ b/arch/i386/lguest/core.c   Thu Mar 08 16:21:42 2007 +1100
@@ -24,17 +24,21 @@ static char __initdata hypervisor_blob[]
 #include "hypervisor-blob.c"
 };
 
-#define MAX_LGUEST_GUESTS\
-   (((PAGE_SIZE << HYPERVISOR_PAGE_ORDER) - sizeof(hypervisor_blob)) \
+/* 64k ought to be enough for anybody! */
+#define HYPERVISOR_PAGES (65536 / PAGE_SIZE)
+
+#define MAX_LGUEST_GUESTS  \
+   (((HYPERVISOR_PAGES * PAGE_SIZE) - sizeof(hypervisor_blob)) \
 / sizeof(struct lguest_state))
 
 static struct vm_struct *hypervisor_vma;
+/* Pages for hypervisor itself */
+static struct page *hype_page[HYPERVISOR_PAGES];
 static int cpu_had_pge;
 static struct {
unsigned long offset;
unsigned short segment;
 } lguest_entry __attribute_used__;
-struct page *hype_pages; /* Contiguous pages. */
 struct lguest lguests[MAX_LGUEST_GUESTS];
 DEFINE_MUTEX(lguest_lock);
 
@@ -58,15 +62,19 @@ struct lguest_state *__lguest_states(voi
 
 static __init int map_hypervisor(void)
 {
-   unsigned int i;
-   int err;
-   struct page *pages[HYPERVISOR_PAGES], **pagep = pages;
-
-   hype_pages = alloc_pages(GFP_KERNEL|__GFP_ZERO, HYPERVISOR_PAGE_ORDER);
-   if (!hype_pages)
-   return -ENOMEM;
-
-   hypervisor_vma = __get_vm_area(PAGE_SIZE << HYPERVISOR_PAGE_ORDER,
+   int i, err;
+   struct page **pagep = hype_page;
+
+   for (i = 0; i < ARRAY_SIZE(hype_page); i++) {
+   unsigned long addr = get_zeroed_page(GFP_KERNEL);
+   if (!addr) {
+   err = -ENOMEM;
+   goto free_some_pages;
+   }
+   hype_page[i] = virt_to_page(addr);
+   }
+
+   hypervisor_vma = __get_vm_area(ARRAY_SIZE(hype_page) * PAGE_SIZE,
   VM_ALLOC, HYPE_ADDR, VMALLOC_END);
if (!hypervisor_vma) {
err = -ENOMEM;
@@ -74,9 +82,6 @@ static __init int map_hypervisor(void)
goto free_pages;
}
 
-   for (i = 0; i < HYPERVISOR_PAGES; i++)
-   pages[i] = hype_pages + i;
-
err = map_vm_area(hypervisor_vma, PAGE_KERNEL, );
if (err) {
printk("lguest: map_vm_area failed: %i\n", err);
@@ -100,14 +105,20 @@ free_vma:
 free_vma:
vunmap(hypervisor_vma->addr);
 free_pages:
-   __free_pages(hype_pages, HYPERVISOR_PAGE_ORDER);
+   i = ARRAY_SIZE(hype_page);
+free_some_pages:
+   for (--i; i >= 0; i--)
+   __free_pages(hype_page[i], 0);
return err;
 }
 
 static __exit void unmap_hypervisor(void)
 {
+   unsigned int i;
+
vunmap(hypervisor_vma->addr);
-   __free_pages(hype_pages, HYPERVISOR_PAGE_ORDER);
+   for (i = 0; i < ARRAY_SIZE(hype_page); i++)
+   __free_pages(hype_page[i], 0);
 }
 
 /* IN/OUT insns: enough to get us past boot-time probing. */
@@ -390,7 +401,7 @@ static int __init init(void)
if (err)
return err;
 
-   err = init_pagetables(hype_pages);
+   err = init_pagetables(hype_page, HYPERVISOR_PAGES);
if (err) {
unmap_hypervisor();
return err;
diff -r 9fea34a28460 arch/i386/lguest/lg.h
--- a/arch/i386/lguest/lg.h Thu Mar 08 16:09:00 2007 +1100
+++ b/arch/i386/lguest/lg.h Thu Mar 08 16:21:42 2007 +1100
@@ -2,9 +2,6 @@
 #define _LGUEST_H
 
 #include 
-/* 64k ought to be enough for anybody! */
-#define HYPERVISOR_PAGE_ORDER (16 - PAGE_SHIFT)
-#define HYPERVISOR_PAGES (1 << HYPERVISOR_PAGE_ORDER)
 
 #define GDT_ENTRY_LGUEST_CS10
 #define GDT_ENTRY_LGUEST_DS11
@@ -43,7 +40,7 @@ struct lguest_regs
 };
 
 __exit void free_pagetables(void);
-__init int init_pagetables(struct page *hype_pages);
+__init int init_pagetables(struct page **hype_page, int pages);
 
 /* Full 4G segment descriptors, suitable for CS and DS. */
 #define FULL_EXEC_SEGMENT ((struct desc_struct){0x, 0x00cf9b00})
@@ -122,7 +119,6 @@ struct lguest
struct host_trap interrupt[LGUEST_IRQS];
 };
 
-extern struct page *hype_pages; /* Contiguous pages. */
 extern struct lguest lguests[];
 extern struct mutex lguest_lock;
 
diff -r 9fea34a28460 arch/i386/lguest/page_tables.c
--- a/arch/i386/lguest/page_tables.cThu Mar 08 16:09:00 2007 +1100
+++ b/arch/i386/lguest/page_tables.cThu Mar 08 16:24:56 2007 +1100
@@ -328,9 +328,23 @@ static void free_hypervisor_pte_pages(vo

[PATCH] Fix sparc TIF_USEDFPU flag atomicity

2007-03-08 Thread Mathieu Desnoyers
Fix sparc TIF_USEDFPU flag atomicity

Non atomic update of TIF can be very dangerous, except at thread structure
creation time. Here I standardize the TIF_USEDFPU usage of the sparc arch.

Applies on 2.6.20.

Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>

--- a/arch/sparc/kernel/process.c
+++ b/arch/sparc/kernel/process.c
@@ -348,7 +348,7 @@ void exit_thread(void)
 #ifndef CONFIG_SMP
if(last_task_used_math == current) {
 #else
-   if(current_thread_info()->flags & _TIF_USEDFPU) {
+   if(test_tsk_thread_flag(current, TIF_USEDFPU)) {
 #endif
/* Keep process from leaving FPU in a bogon state. */
put_psr(get_psr() | PSR_EF);
@@ -357,7 +357,7 @@ void exit_thread(void)
 #ifndef CONFIG_SMP
last_task_used_math = NULL;
 #else
-   current_thread_info()->flags &= ~_TIF_USEDFPU;
+   clear_tsk_thread_flag(current, TIF_USEDFPU);
 #endif
}
 }
@@ -371,7 +371,7 @@ void flush_thread(void)
 #ifndef CONFIG_SMP
if(last_task_used_math == current) {
 #else
-   if(current_thread_info()->flags & _TIF_USEDFPU) {
+   if(test_tsk_thread_flag(current, TIF_USEDFPU)) {
 #endif
/* Clean the fpu. */
put_psr(get_psr() | PSR_EF);
@@ -380,7 +380,7 @@ void flush_thread(void)
 #ifndef CONFIG_SMP
last_task_used_math = NULL;
 #else
-   current_thread_info()->flags &= ~_TIF_USEDFPU;
+   clear_tsk_thread_flag(current, TIF_USEDFPU);
 #endif
}
 
@@ -466,13 +466,13 @@ int copy_thread(int nr, unsigned long clone_flags, 
unsigned long sp,
 #ifndef CONFIG_SMP
if(last_task_used_math == current) {
 #else
-   if(current_thread_info()->flags & _TIF_USEDFPU) {
+   if(test_tsk_thread_flag(current, TIF_USEDFPU)) {
 #endif
put_psr(get_psr() | PSR_EF);
fpsave(>thread.float_regs[0], >thread.fsr,
   >thread.fpqueue[0], >thread.fpqdepth);
 #ifdef CONFIG_SMP
-   current_thread_info()->flags &= ~_TIF_USEDFPU;
+   clear_tsk_thread_flag(current, TIF_USEDFPU);
 #endif
}
 
@@ -609,13 +609,13 @@ int dump_fpu (struct pt_regs * regs, elf_fpregset_t * 
fpregs)
return 1;
}
 #ifdef CONFIG_SMP
-   if (current_thread_info()->flags & _TIF_USEDFPU) {
+   if (test_tsk_thread_flag(current, TIF_USEDFPU)) {
put_psr(get_psr() | PSR_EF);
fpsave(>thread.float_regs[0], >thread.fsr,
   >thread.fpqueue[0], >thread.fpqdepth);
if (regs != NULL) {
regs->psr &= ~(PSR_EF);
-   current_thread_info()->flags &= ~(_TIF_USEDFPU);
+   clear_tsk_thread_flag(current, TIF_USEDFPU);
}
}
 #else
diff --git a/arch/sparc/kernel/traps.c b/arch/sparc/kernel/traps.c
index 6a70d21..7a7ad05 100644
--- a/arch/sparc/kernel/traps.c
+++ b/arch/sparc/kernel/traps.c
@@ -259,7 +259,7 @@ void do_fpd_trap(struct pt_regs *regs, unsigned long pc, 
unsigned long npc,
} else {
fpload(>thread.float_regs[0], >thread.fsr);
}
-   current_thread_info()->flags |= _TIF_USEDFPU;
+   set_tsk_thread_flag(current, TIF_USEDFPU);
 #endif
 }
 
@@ -290,7 +290,7 @@ void do_fpe_trap(struct pt_regs *regs, unsigned long pc, 
unsigned long npc,
 #ifndef CONFIG_SMP
if(!fpt) {
 #else
-if(!(task_thread_info(fpt)->flags & _TIF_USEDFPU)) {
+if(!test_tsk_thread_flag(fpt, TIF_USEDFPU)) {
 #endif
fpsave(_regs[0], _fsr, _queue[0], _depth);
regs->psr &= ~PSR_EF;
@@ -333,7 +333,7 @@ void do_fpe_trap(struct pt_regs *regs, unsigned long pc, 
unsigned long npc,
/* nope, better SIGFPE the offending process... */
   
 #ifdef CONFIG_SMP
-   task_thread_info(fpt)->flags &= ~_TIF_USEDFPU;
+   clear_tsk_thread_flag(fpt, TIF_USEDFPU);
 #endif
if(psr & PSR_PS) {
/* The first fsr store/load we tried trapped,
-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Any faster and more efficient way to repeatedly access /proc/*

2007-03-08 Thread Matthew Helsley
On Thu, 2007-03-08 at 16:55 -0800, [EMAIL PROTECTED] wrote:
> Hi,
> 
> Is there a faster way to access "/proc/*" other than open it as a file and 
> reading/parsing contents? e.g. fopen("/proc/stat", "r");
> 
> In BSD, there is the kvm method of access, which is relatively fast (light 
> weight)
> 
> In Linux, if I have a daemon that keeps track of these statistics, it's a 
> hell way to manage.
> 
> Imagine, having to probe the stat of each process?

Have you looked at Task Stats (CONFIG_TASKSTATS)?

Cheers,
-Matt Helsley

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/9] lguest: bridging support in example code

2007-03-08 Thread Rusty Russell
Expand the --tunnet option to take a bridge name as an argument, so that
the tap interface is added to the specified bridge.  This makes it
convenient to use bridging for connecting the guest to external networks.

Signed-off-by: James Morris <[EMAIL PROTECTED]>
Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r cff3d561d1b0 Documentation/lguest/lguest.c
--- a/Documentation/lguest/lguest.c Thu Mar 08 15:52:15 2007 +1100
+++ b/Documentation/lguest/lguest.c Thu Mar 08 16:08:36 2007 +1100
@@ -23,7 +23,8 @@
 #include 
 #include 
 #include 
-#include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -36,6 +37,7 @@ typedef uint8_t u8;
 
 #define PAGE_PRESENT 0x7   /* Present, RW, Execute */
 #define NET_PEERNUM 1
+#define BRIDGE_PFX "bridge:"
 
 static bool verbose;
 #define verbose(args...) \
@@ -582,20 +584,16 @@ static u32 handle_block_output(int fd, c
((u8)(ip >> 8)),\
((u8)(ip))
 
-static void configure_device(const char *devname, u32 ipaddr,
+static void configure_device(int fd, const char *devname, u32 ipaddr,
 unsigned char hwaddr[6])
 {
struct ifreq ifr;
-   int fd;
struct sockaddr_in *sin = (struct sockaddr_in *)_addr;
 
memset(, 0, sizeof(ifr));
strcpy(ifr.ifr_name, devname);
sin->sin_family = AF_INET;
sin->sin_addr.s_addr = htonl(ipaddr);
-   fd = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
-   if (fd < 0)
-   err(1, "opening IP socket");
if (ioctl(fd, SIOCSIFADDR, ) != 0)
err(1, "Setting %s interface address", devname);
ifr.ifr_flags = IFF_UP;
@@ -724,13 +722,34 @@ static u32 str2ip(const char *ipaddr)
return (byte[0] << 24) | (byte[1] << 16) | (byte[2] << 8) | byte[3];
 }
 
-static void setup_tun_net(const char *ipaddr,
+/* adapted from libbridge */
+static void add_to_bridge(int fd, const char *if_name, const char *br_name)
+{
+   int ifidx;
+   struct ifreq ifr;
+
+   if (!*br_name)
+   errx(1, "must specify bridge name");
+
+   ifidx = if_nametoindex(if_name);
+   if (!ifidx)
+   errx(1, "interface %s does not exist!", if_name);
+
+   strncpy(ifr.ifr_name, br_name, IFNAMSIZ);
+   ifr.ifr_ifindex = ifidx;
+   if (ioctl(fd, SIOCBRADDIF, ) < 0)
+   err(1, "can't add %s to bridge %s", if_name, br_name);
+}
+
+static void setup_tun_net(const char *arg,
  struct lguest_device_desc *descs,
  struct devices *devices)
 {
struct device *dev;
struct ifreq ifr;
-   int netfd;
+   int netfd, ipfd;
+   u32 ipaddr;
+   const char *br_name = NULL;
 
netfd = open("/dev/net/tun", O_RDWR);
if (netfd < 0)
@@ -748,15 +767,29 @@ static void setup_tun_net(const char *ip
dev->priv = malloc(sizeof(bool));
*(bool *)dev->priv = false;
 
+   ipfd = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
+   if (ipfd < 0)
+   err(1, "opening IP socket");
+
+   if (!strncmp(BRIDGE_PFX, arg, strlen(BRIDGE_PFX))) {
+   ipaddr = INADDR_ANY;
+   br_name = arg + strlen(BRIDGE_PFX);
+   add_to_bridge(ipfd, ifr.ifr_name, br_name);
+   } else
+   ipaddr = str2ip(arg);
+
/* We are peer 0, rest is all NO_GUEST */
memset(dev->mem, 0xFF, getpagesize());
-   configure_device(ifr.ifr_name, str2ip(ipaddr), dev->mem);
+   configure_device(ipfd, ifr.ifr_name, ipaddr, dev->mem);
+   close(ipfd);
 
/* You will be peer 1: we should create enough jitter to randomize */
dev->desc->features = NET_PEERNUM|LGUEST_DEVICE_F_RANDOMNESS;
verbose("device [EMAIL PROTECTED]: tun net %u.%u.%u.%u\n", dev->desc,
(void *)(dev->desc->pfn * getpagesize()),
-   HIPQUAD(str2ip(ipaddr)));
+   HIPQUAD(ipaddr));
+   if (br_name)
+   verbose("attched to bridge: %s\n", br_name);
 }
 
 static void setup_block_file(const char *filename,
@@ -887,8 +920,8 @@ int main(int argc, char *argv[])
 
if (argc < 4)
errx(1, "Usage: lguest [--verbose]  vmlinux "
-   
"[--sharenet=|--tunnet=|--block="
-   "|--initrd=]... [args...]");
+   
"[--sharenet=|--tunnet=(|bridge:)"
+   "|--block=|--initrd=]... 
[args...]");
 
zero_fd = open("/dev/zero", O_RDONLY, 0);
if (zero_fd < 0)
diff -r cff3d561d1b0 Documentation/lguest/lguest.txt
--- a/Documentation/lguest/lguest.txt   Thu Mar 08 15:52:15 2007 +1100
+++ b/Documentation/lguest/lguest.txt   Thu Mar 08 16:02:49 2007 +1100
@@ -77,10 +77,26 @@ Running Lguest:
   /proc/sys/net/ipv4/ip_forward".  In this example, I would configure
   eth0 inside the guest at 192.168.19.2.
 
+  Another method is to bridge the tap device to an external interface
+  using --tunnet=bridge:, and 

[PATCH 1/9] lguest: block device speedup

2007-03-08 Thread Rusty Russell
Jens Axboe pointed out that end_request() does not end the entire
request.  Go figure.  On the upside, he wrote the replacement for me!
Now we do far less block traffic, and our performance sucks less.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r fdc8cbc1fd61 drivers/block/lguest_blk.c
--- a/drivers/block/lguest_blk.cThu Mar 08 13:35:39 2007 +1100
+++ b/drivers/block/lguest_blk.cThu Mar 08 15:51:55 2007 +1100
@@ -45,6 +45,16 @@ struct blockdev
struct request *req;
 };
 
+/* Jens gave me this nice helper to end all chunks of a request. */
+static void end_entire_request(struct request *req, int uptodate)
+{
+   if (end_that_request_first(req, uptodate, req->hard_nr_sectors))
+   BUG();
+   add_disk_randomness(req->rq_disk);
+   blkdev_dequeue_request(req);
+   end_that_request_last(req, uptodate);
+}
+
 static irqreturn_t lgb_irq(int irq, void *_bd)
 {
struct blockdev *bd = _bd;
@@ -61,7 +71,7 @@ static irqreturn_t lgb_irq(int irq, void
}
 
spin_lock_irqsave(>lock, flags);
-   end_request(bd->req, bd->lb_page->result == 1);
+   end_entire_request(bd->req, bd->lb_page->result == 1);
bd->req = NULL;
bd->dma.used_len = 0;
blk_start_queue(bd->disk->queue);
@@ -149,7 +159,7 @@ again:
pr_debug("Got non-command 0x%08x\n", req->cmd_type);
error:
req->errors++;
-   end_request(req, 0);
+   end_entire_request(req, 0);
goto again;
} else {
if (rq_data_dir(req) == WRITE)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] epoll use a single inode ...

2007-03-08 Thread Anton Blanchard

Hi,

> Well, PowerPC "dcbt" does prefetch() correctly, it doesn't ever raise  
> exceptions, doesn't have any side effects, takes only enough CPU to  
> decode the address, and is ignored if it would have to do anything  
> other than load the cacheline from RAM.  Prefetch streams are halted  
> when they reach the end of a page boundary (no trapping to the MMU)  
> and if the TLB entry isn't present then they would asynchronously  
> abort.  

It depends on the implementation and the HID bit settings. Some do walk
the MMU hashtable if it isnt in the TLB.

Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] epoll use a single inode ...

2007-03-08 Thread Anton Blanchard

> OK, 200 cycles...
> 
> But what is the cost of the conditional branch you added in prefetch(x) ?

Much less than the tablewalk. On ppc64 a tablewalk of an address that is
not populated in the hashtable will incur 2 cacheline lookups (primary
and secondary buckets). This plus the MMU state machine overhead adds up.

Cue Linus rant about PowerPC MMU :)

Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Fix atomicity of TIF update in flush_thread() for powerpc

2007-03-08 Thread Mathieu Desnoyers
Fix atomicity of TIF update in flush_thread() for powerpc

Race :

parent process executing :
sys_ptrace()
 (lock_kernel())
 (ptrace_get_task_struct(pid))
 arch_ptrace()
   ptrace_detach()
 ptrace_disable(child);
   clear_singlestep(child);
 clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 (which clears the TIF_SINGLESTEP flag atomically from a different
  process)
 (put_task_struct(child))
 (unlock_kernel())

And at the same time, in the child process :
sys_execve()
 do_execve()
   search_binary_handler()
 load_elf_binary()
   flush_old_exec()
 flush_thread()
   doing a non-atomic thread flag update 

Applies on 2.6.20.

Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>

--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -476,8 +476,13 @@ void flush_thread(void)
 #ifdef CONFIG_PPC64
struct thread_info *t = current_thread_info();
 
-   if (t->flags & _TIF_ABI_PENDING)
-   t->flags ^= (_TIF_ABI_PENDING | _TIF_32BIT);
+   if (test_tsk_thread_flag(tsk, TIF_ABI_PENDING)) {
+   clear_tsk_thread_flag(tsk, TIF_ABI_PENDING);
+   if (test_tsk_thread_flag(tsk, TIF_32BIT))
+   clear_tsk_thread_flag(tsk, TIF_32BIT);
+   else
+   set_tsk_thread_flag(tsk, TIF_32BIT);
+   }
 #endif
 
discard_lazy_cpu_state();
-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] epoll use a single inode ...

2007-03-08 Thread Anton Blanchard

> Yeah, I'm not at all surprised. Any implementation of "prefetch" that 
> doesn't just turn into a no-op if the TLB entry doesn't exist (which makes 
> them weaker for *actual* prefetching) will generally have a hard time with 
> a NULL pointer. Exactly because it will try to do a totally unnecessary 
> TLB fill - and since most CPU's will not cache negative TLB entries, that 
> unnecessary TLB fill will be done over and over and over again..

Yeah this is exactly what we were seeing :)

Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PAGE_SIZE Availability Inconsistency

2007-03-08 Thread Anton Blanchard

Hi,

> I might be missing something but doesn't this break every
> SWAP partition that was created with something other than
> MIN_PAGE_SIZE?

It does. I was thinking we could work around it in ppc64 (64kB is quite
new), but I forgot there are options on sparc64 to change the page size :)

The other option is to create a v3 swap format that doesnt use any
PAGE_SIZE parameters.

Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Fix atomicity of TIF update in flush_thread() for sparc64

2007-03-08 Thread Mathieu Desnoyers
Fix atomicity of TIF update in flush_thread() for x86_64

Race :

parent process executing :
sys_ptrace()
 (lock_kernel())
 (ptrace_get_task_struct(pid))
 arch_ptrace()
   ptrace_detach()
 ptrace_disable(child);
   clear_singlestep(child);
 clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 (which clears the TIF_SINGLESTEP flag atomically from a different
  process)
 (put_task_struct(child))
 (unlock_kernel())

And at the same time, in the child process :
sys_execve()
 do_execve()
   search_binary_handler()
 load_elf_binary()
   flush_old_exec()
 flush_thread()
   doing a non-atomic thread flag update 

It applies on 2.6.20.

Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>

--- a/arch/sparc64/kernel/process.c
+++ b/arch/sparc64/kernel/process.c
@@ -413,8 +413,13 @@ void flush_thread(void)
struct thread_info *t = current_thread_info();
struct mm_struct *mm;
 
-   if (t->flags & _TIF_ABI_PENDING)
-   t->flags ^= (_TIF_ABI_PENDING | _TIF_32BIT);
+   if (test_tsk_thread_flag(tsk, TIF_ABI_PENDING)) {
+   clear_tsk_thread_flag(tsk, TIF_ABI_PENDING);
+   if (test_tsk_thread_flag(tsk, TIF_32BIT))
+   clear_tsk_thread_flag(tsk, TIF_32BIT);
+   else
+   set_tsk_thread_flag(tsk, TIF_32BIT);
+   }
 
mm = t->task->mm;
if (mm)
-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] rcfs core patch

2007-03-08 Thread Paul Jackson
Herbert wrote:
> why is the filesystem approach so favored for this
> kind of manipulations?

I don't have any clear sense of whether the additional uses of file
systems being considered here are a good idea or not, but the use of a
file system for cpusets has turned out quite well, in my (vain and
biased ;) view.

Cpusets are subsets of the CPUs and memory nodes on a system.

These subsets naturally form a partial ordering, where one cpuset is
below another if its CPUs and nodes are a subset of the other ones.

This forms a natural hierarchical space.  It is quite convenient to be
able to add names and file system like attributes, so that one can do
things like -name- the set of CPUs to which you are attaching a job, as
in "this job is to run on the CPUs in cpuset /foo/bar", and to further
have file system like permissions on these subsets, to control who can
access or modify them.

For such hierarchical data structures, especially ones where names and
permissions are useful, file systems are a more natural interface than
traditional system call usage patterns.

The key, in my view, is the 'shape' of the data.  If the data schema is
basically a single table, with uniform rows having a few fields each,
where each field is a simple integer or string (not a fancy formatted
string encoding some more elaborate shape) then classic system call
patterns work well.  If the schema is tree shaped, and especially if
the usual file system attributes such as a hierarchical name space and
permissions are useful, then a file system based API is probably best.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix read past end of array in md/linear.c

2007-03-08 Thread Bill Davidsen

Andy Isaacson wrote:

When iterating through an array, one must be careful to test one's index
variable rather than another similarly-named variable.  


The loop will read off the end of conf->disks[] in the following
(pathological) case:

% dd bs=1 seek=840716287 if=/dev/zero of=d1 count=1
% for i in 2 3 4; do dd if=/dev/zero of=d$i bs=1k count=$(($i+150)); done
% ./vmlinux ubd0=root ubd1=d1 ubd2=d2 ubd3=d3 ubd4=d4
# mdadm -C /dev/md0 --level=linear --raid-devices=4 /dev/ubd[1234]

adding some printks, I saw this:
[42949374.96] hash_spacing = 821120
[42949374.96] cnt  = 4
[42949374.96] min_spacing  = 801
[42949374.96] j=0 size=820928 sz=820928
[42949374.96] i=0 sz=820928 hash_spacing=820928
[42949374.96] j=1 size=64 sz=64
[42949374.96] j=2 size=64 sz=128
[42949374.96] j=3 size=64 sz=192
[42949374.96] j=4 size=1515870810 sz=1515871002

Index: linus/drivers/md/linear.c
===
--- linus.orig/drivers/md/linear.c  2007-03-02 11:35:55.0 -0800
+++ linus/drivers/md/linear.c   2007-03-07 13:10:30.0 -0800
@@ -188,7 +188,7 @@
for (i=0; i < cnt-1 ; i++) {
sector_t sz = 0;
int j;
-   for (j=i; idisks[j].size;
if (sz >= min_spacing && sz < conf->hash_spacing)
conf->hash_spacing = sz;


After looking at that code, I have to wonder how this ever worked, or if 
in fact anyone ever took this path. I assume that the value of sz caused 
the loop exit in all cases, since this has been in the code at least 
since 2.6.15, oldest thing I have handy.


--
bill davidsen <[EMAIL PROTECTED]>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v4 - signalfd core ...

2007-03-08 Thread Davide Libenzi
On Thu, 8 Mar 2007, Davide Libenzi wrote:

> +static ssize_t signalfd_read(struct file *file, char *buf, size_t count,
> +  loff_t *ppos)
> +{
> + struct signalfd_ctx *ctx = file->private_data;
> + struct sighand_struct *sighand = ctx->sighand;
> + ssize_t res = 0;
> + int signo = 0;
> + siginfo_t info;
> + DECLARE_WAITQUEUE(wait, current);
> +
> + if (count < sizeof(struct signalfd_siginfo))
> + return -EINVAL;
> + spin_lock_irq(>siglock);
> + if (unlikely(sighand != ctx->tsk->sighand))
> + goto out_unlock;
> + res = -EAGAIN;
> + if ((signo = dequeue_signal(ctx->tsk, >sigmask, )) != 0 &&

Grrr, never change the code after you tested it. The above is clearly:

if ((signo = dequeue_signal(ctx->tsk, >sigmask, )) == 0 &&
...



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >