Re: [BUG] 2.6.22-rc3-mm1 remove bluetooth usb adapter caused kmalloc bug

2007-06-05 Thread young dave

Hi,

2007/6/6, Christoph Lameter <[EMAIL PROTECTED]>:
Note that the corruption seems to have its cause in a decrement done at
offset 16 into the object pointing to the refcount in struct hci_dev. So
it looks like the refcount was decremented after the object was freed.

sysfs related?


I noticed in hci_core.c:

hci_dev_close call hci_dev_do_close , then call hci_dev_put

but in hci_dev_do_close  also call hci_dev_put

Maybe this is the reason, by apply the below patch the bug seems
doesn't exist,  but the strange thing is the 2.4.22-rc4 seems works, I
will test once more to see the result.

Signed-off-by: dave young <[EMAIL PROTECTED]>
---
net/bluetooth/hci_core.c |1 -
1 file changed, 1 deletion(-)

diff -dur linux/net/bluetooth/hci_core.c linux.new/net/bluetooth/hci_core.c
--- linux/net/bluetooth/hci_core.c  2007-06-06 13:47:14.0 +
+++ linux.new/net/bluetooth/hci_core.c  2007-06-06 13:46:58.0 +
@@ -577,7 +577,6 @@

   hci_req_unlock(hdev);

-   hci_dev_put(hdev);
   return 0;
}

Regards
dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] audit: rework execve audit

2007-06-05 Thread Peter Zijlstra
On Tue, 2007-06-05 at 16:39 -0700, Andrew Morton wrote:
> On Tue, 05 Jun 2007 17:05:25 +0200
> Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> 
> > The purpose of audit_bprm() is to log the argv array to a userspace daemon 
> > at
> > the end of the execve system call. Since user-space hasn't had time to run,
> > this array is still in pristine state on the process' stack; so no need to 
> > copy
> > it, we can just grab it from there.
> > 
> > In order to minimize the damage to audit_log_*() copy each string into a
> > temporary kernel buffer first.
> > 
> > Currently the audit code requires that the full argument vector fits in a
> > single packet. So currently it does clip the argv size to a (sysctl) limit, 
> > but
> > only when execve auditing is enabled.
> > 
> > If the audit protocol gets extended to allow for multiple packets this check
> > can be removed.
> > 
> > ...
> >  
> 
> Please try to avoid trigger-happiness with the BUG_ON()s..
> 
> >  struct audit_aux_data_socketcall {
> > @@ -834,6 +834,47 @@ static int audit_log_pid_context(struct 
> > return rc;
> >  }
> >  
> > +static void audit_log_execve_info(struct audit_buffer *ab,
> > +   struct audit_aux_data_execve *axi)
> > +{
> > +   int i;
> > +   long len;
> > +   const char __user *p = (const char __user *)axi->mm->arg_start;
> > +
> > +   if (axi->mm != current->mm)
> > +   return; /* execve failed, no additional info */
> > +
> > +   for (i = 0; i < axi->argc; i++, p += len) {
> > +   long ret;
> > +   char *tmp;
> > +
> > +   len = strnlen_user(p, MAX_ARG_PAGES*PAGE_SIZE);
> > +   /*
> > +* We just created this mm, if we can't find the strings
> > +* we just copied in something is _very_ wrong.
> > +*/
> > +   BUG_ON(!len);
> > +
> > +   tmp = kmalloc(len, GFP_KERNEL);
> > +   if (!tmp) {
> > +   audit_panic("out of memory for argv string\n");
> > +   break;
> > +   }
> > +
> > +   ret = copy_from_user(tmp, p, len);
> > +   /*
> > +* There is no reason for this copy to be short.
> > +*/
> > +   BUG_ON(ret);
> 
> You sure?  What happens if another thread does munmap() in parallel?
> 
> I think I'll make this WARN_ON just out of principle.

This is right after the execve call, and before we've hit userspace, so
at this time there is no runnable context with access to the memory
(except this one).


> > @@ -1208,9 +1209,11 @@ int do_execve(char * filename,
> > if (retval < 0)
> > goto out;
> >  
> > +   tmp = bprm->p;
> > retval = copy_strings(bprm->argc, argv, bprm);
> > if (retval < 0)
> > goto out;
> > +   bprm->argv_len = tmp - bprm->p;
> 
> 
> 
> 
> 
> --- a/include/linux/kernel.h~a
> +++ a/include/linux/kernel.h
> @@ -5,6 +5,8 @@
>   * 'kernel.h' contains some often-used function prototypes etc
>   */
>  
> +#define tmp don't call your variables tmp!
> +
>  #ifdef __KERNEL__
>  
>  #include 

Fair enough. :-/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


remove references to dead urls from mtd nand code.

2007-06-05 Thread Dave Jones
As reported in http://bugzilla.kernel.org/show_bug.cgi?id=7815
this URL 404's.  Unless they're coming back, we should probably
just remove them.

Signed-off-by: Dave Jones <[EMAIL PROTECTED]>

diff --git a/drivers/mtd/nand/Kconfig b/drivers/mtd/nand/Kconfig
index f1d60b6..8f665bb 100644
--- a/drivers/mtd/nand/Kconfig
+++ b/drivers/mtd/nand/Kconfig
@@ -7,8 +7,7 @@ menuconfig MTD_NAND
select MTD_NAND_IDS
help
  This enables support for accessing all type of NAND flash
- devices. For further information see
- .
+ devices.
 
 if MTD_NAND
 
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 7e68203..3f406c7 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -6,9 +6,6 @@
  *   capable of working with almost all NAND chips currently available.
  *   Basic support for AG-AND chips is provided.
  *
- * Additional technical information is available on
- * http://www.linux-mtd.infradead.org/tech/nand.html
- *
  *  Copyright (C) 2000 Steven J. Hill ([EMAIL PROTECTED])
  *   2002-2006 Thomas Gleixner ([EMAIL PROTECTED])
  *
-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] ptraced process waiting on syscall may return kernel internal errnos

2007-06-05 Thread Satoru Takeuchi
Hi,

If there is a multithread process which is waiting on restartable syscall
and ptraced, some threads may return from syscalls with a errno which should
never be seen by user programs when they receive SIGSTOP. It is not a rare
case beacuse strace send SIGSTOP to attached process on its exit (e.g. on
receiving SIGINT from terminal).

I found this problem on 2.6.22-rc3 and I also confirmed 2.6.22-rc4 has same
problem. Probably this bug is in generic signal code because this problem
occurs on both i386 box and ia64 box.

This bug is very easy to recreate and I don't know whether or not the problem
has some relation with the following bug which reported recently by Benjamin
Herrenschmidt.

http://lkml.org/lkml/2007/6/4/468

I executed this recreate program on 2.6.22-rc4 with the following Linus's
patch and this bug also occured.

http://lkml.org/lkml/2007/6/4/471

For more details, please refer to the attached recreate program.



BTW, I found one more strace related bug. I'll report it soon...

Thanks,
Satoru

---
/*
 * recreate-signal-mt-ptrace-bug-pipe - recreate a signal bug.
 *
 * ---
 * 
 * Problem
 * ===
 *
 * If there is a multithread process which is in restartable syscall and
 * ptraced, some threads may return from syscalls with a errno which should
 * never be seen by user programs when they receive SIGSTOP. It is not a
 * rare case beacuse strace send SIGSTOP to attached process on its exit.
 *
 * How to recreate
 * ===
 *
 * 1. run this program
 * 
 *$ ./recreate-signal-mt-ptrace-bug-pipe &
 * 
 * 2. run strace and attach this program
 *
 *$ strace -f -p $!
 *
 * 3. C-c on terminal (*1)
 *
 * (*1) Directly send SIGSTOP to ./recreate-signal-mt-ptrace-bug-pipe is
 *  also OK
 *
 * Expected Result
 * ===
 *
 * All threads of this program was detached safely
 * 
 * Actual Result
 * =
 *
 * Some threads may return from read() with ERESTARTSYS and print the
 * following message.
 *
 *  read() failed with errno 512
 * 
 * Note
 * 
 *
 * This program can't always recreate a problem. However recreate
 * possibility is very high.
 * 
 *--
 * 
 * Copyright 2007 Satoru Takeuchi <[EMAIL PROTECTED]>
 *
 * This software may be used and distributed according to the terms
 * of the GNU General Public License, incorporated herein by reference.
 * 
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 

static int fd[2];

void *thread_fn(void *arg)
{
char c;

if (read(fd[0], , sizeof(char)) < 0)
err(EXIT_FAILURE, "read() failed with errno %d\n", errno);

return NULL;
}

#define NTHREAD 64

int main(int argc, char **argv)
{
pthread_t t[NTHREAD];
int i;

if (pipe(fd) < 0)
err(EXIT_FAILURE, "pipe() failed");

for (i = 0; i < NTHREAD; i++)
if (pthread_create([i], NULL, thread_fn, NULL)) {
warn("pthread_create() failed\n");
exit(EXIT_FAILURE);
}

for (i = 0; i < NTHREAD; i++)
if (!pthread_join(t[i], NULL))
warn("pthread_join() failed");

exit(EXIT_SUCCESS);
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb-scanner-cameras kernel-2.6.22 and udev-095 problem

2007-06-05 Thread Greg KH
On Tue, Jun 05, 2007 at 07:20:14PM -0500, [EMAIL PROTECTED] wrote:
>  sorry for not responding i was busy (i updated my sys to fc7)
>  on fc7-i386 and on fc-x86-64(2.6.22-rc4-cfq7 SMP PREEMPT x86_64 GNU/Linux) i 
>  can't see usb scanner
>  i test this with hp-6300 and aqfa snapscan-1212u
>  non of them triger creation /dev/scanner-x device
>  xsane cannot see this scanner without this i assume.

Is CONFIG_USB_DEVICE_CLASS enabled in this kernel?

Are _any_ files showing up in /dev/bus/usb/?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


rc4 libata regression - commit 464cf177

2007-06-05 Thread Sean

Hi Jeff,

Don't see a fix for this issue in your tree, although i think someone
else may have already reported this.   Just thought better safe than
sorry and report perhaps again that rc4 can't boot here because the
sata drives can not be found.  Reverting commit 464cf177 fixes the
problem.

As an aside, booting the patch-reverted rc4 does not resolve the other
issue I just reported to the list[1] (sorry for not cc'ing you).

Cheers.
Sean

[1] http://marc.info/?l=linux-kernel=118109439301092=2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] 2.4.34 - Add some ahci pci ids

2007-06-05 Thread Willy Tarreau
On Fri, Jun 01, 2007 at 07:40:54AM -0400, Jeff Garzik wrote:
> Filippo Carletti wrote:
> >This patch adds support for some chipsets in ahci driver.
> >The list comes from a patch for redhat kernel 2.6.9-34.
> >I only tested ICH8.
> >
> >The original patch contained also this lines (that I omitted):
> >+   /* JMicron-specific fixup: make sure we're in AHCI mode */
> >+   if (pdev->vendor == 0x197b)
> >+   pci_write_config_byte(pdev, 0x41, 0xa1);
> 
> NAK.  Don't omit obviously needed lines, if you are going to add JMicron 
> PCI IDs.

Filippo,

would you please resend your patch including the lines you have omitted ?
Jeff is right, those lines are explicitly labelled as a fixup, so either
we merge the complete patch, or nothing at all.

I expect to release 2.4.35 around the end of this month, so if you can
provide me with an acceptable patch in time so that I can run at least
one -rc with it, I'm OK to merge it (provided that Jeff has no objection
of course). I will merge your other patch (VIA) at the same time.

Thanks in advance,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix race in AF_UNIX

2007-06-05 Thread David Miller
From: Miklos Szeredi <[EMAIL PROTECTED]>
Date: Wed, 06 Jun 2007 07:26:52 +0200

> > Holding a global mutex over recvmsg() calls under AF_UNIX is pretty
> > much a non-starter, this will kill performance for multi-threaded
> > apps.
> 
> That's an rwsem held for read.  It's held for write in unix_gc() only
> for a short duration, and unix_gc() should only rarely be called.  So
> I don't think there's any performance problem here.

It pulls a non-local cacheline into the local thread, that's extremely
expensive on SMP.

If everyone starts grabbing this thing during recvmsg() it's going to
become a really hot lock and kill performance, even if it's a read
side lock being taken.

That's why I said we need to investigate solutions involving
u->readlock, that already has to be taken and is local to the socket.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Extend Linux to support proportional-share scheduling

2007-06-05 Thread Willy Tarreau
On Tue, Jun 05, 2007 at 09:31:33PM -0700, Li, Tong N wrote:
> Willy,
> 
> These are all good comments. Regarding the cache penalty, I've done some
> measurements using benchmarks like SPEC OMP on an 8-processor SMP and
> the performance with this patch was nearly identical to that with the
> mainline. I'm sure some apps may suffer from the potentially more
> migrations with this design. In the end, I think what we want is to
> balance fairness and performance. This design currently emphasizes on
> fairness, but it could be changed to relax fairness when performance
> does become an issue (which could even be a user-tunable knob depending
> on which aspect the user cares more).

Maybe storing in each task a small list of the 2 or 4 last CPUs used would
help the scheduler in trying to place them. I mean, let's say you have 10
tasks and 8 CPUs. You first assign tasks 1..8 CPUs 1..8 for 1 timeslice.
Then you will give 9..10 a run on CPUs 1..2, and CPUs 3..8 will be usable
for other tasks. It wil be optimal to run tasks 3..8 on them. Then you will
stop some of those because they are "in advance", and run 9..10 and 1..2
again. You'll have to switch 1..2 to another group of CPUs to maintain hot
cache on CPUs 1..2 for tasks 9..10. But another possibility would be to
consider that 9..10 and 1..2 have performed the same amount of work, so
let's 9..10 take some advance and benefit from the hot cache, then try to
place 1..2 there again. But it will mean that 3..8 will now have run 2
timeslices more than others. At this moment, it should be wise to make
them sleep and keep their CPU history for future use.

Maybe on end-user systems, the CPUs history is not that important because
of the often small caches, but on high-end systems with large L2/L3 caches,
I think that we can often keep several tasks in the cache, justifying the
ability to select one of the last CPUs used.

Not an easy thing to do, but probably very complementary to your work IMHO.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


spurious completions during NCQ?

2007-06-05 Thread Florin Iucha
Hello,

I was working on a I/O heavy workload (parsing 100K spam messages to
extract certain structures) when I got this in the kernel log:

[ 2320.132893] ata1.00: exception Emask 0x2 SAct 0x701f SErr 0x0 action 0x2 
frozen
[ 2320.132899] ata1.00: (spurious completions during NCQ issue=0x0 
SAct=0x701f FIS=005040a1:0800)
[ 2320.132905] ata1.00: cmd 61/10:00:59:fc:d0/00:00:07:00:00/40 tag 0 cdb 0x0 
data 8192 out
[ 2320.132906]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.132911] ata1.00: cmd 61/10:08:69:fc:d0/00:00:07:00:00/40 tag 1 cdb 0x0 
data 8192 out
[ 2320.132913]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.132918] ata1.00: cmd 61/08:10:19:4c:d1/00:00:07:00:00/40 tag 2 cdb 0x0 
data 4096 out
[ 2320.132919]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.132924] ata1.00: cmd 61/01:18:fb:27:0f/00:00:08:00:00/40 tag 3 cdb 0x0 
data 512 out
[ 2320.132925]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.132930] ata1.00: cmd 61/08:20:11:28:0f/00:00:08:00:00/40 tag 4 cdb 0x0 
data 4096 out
[ 2320.132932]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.132937] ata1.00: cmd 61/08:28:91:92:17/00:00:08:00:00/40 tag 5 cdb 0x0 
data 4096 out
[ 2320.132938]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.132943] ata1.00: cmd 61/08:30:99:b3:17/00:00:08:00:00/40 tag 6 cdb 0x0 
data 4096 out
[ 2320.132944]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.132949] ata1.00: cmd 61/01:38:5b:56:4b/00:00:05:00:00/40 tag 7 cdb 0x0 
data 512 out
[ 2320.132950]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.132956] ata1.00: cmd 61/08:40:71:56:4b/00:00:05:00:00/40 tag 8 cdb 0x0 
data 4096 out
[ 2320.132957]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.132962] ata1.00: cmd 61/08:48:09:cf:5a/00:00:05:00:00/40 tag 9 cdb 0x0 
data 4096 out
[ 2320.132963]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.132968] ata1.00: cmd 61/01:50:3b:af:8b/00:00:05:00:00/40 tag 10 cdb 0x0 
data 512 out
[ 2320.132969]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.132974] ata1.00: cmd 61/08:58:51:af:8b/00:00:05:00:00/40 tag 11 cdb 0x0 
data 4096 out
[ 2320.132976]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.132981] ata1.00: cmd 61/08:60:d9:f1:8d/00:00:05:00:00/40 tag 12 cdb 0x0 
data 4096 out
[ 2320.132982]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.132987] ata1.00: cmd 61/08:68:49:bd:8e/00:00:05:00:00/40 tag 13 cdb 0x0 
data 4096 out
[ 2320.132988]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.132993] ata1.00: cmd 61/08:70:d9:63:d1/00:00:05:00:00/40 tag 14 cdb 0x0 
data 4096 out
[ 2320.132995]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.133000] ata1.00: cmd 61/08:78:91:39:0f/00:00:06:00:00/40 tag 15 cdb 0x0 
data 4096 out
[ 2320.133001]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.133006] ata1.00: cmd 61/08:80:49:30:97/00:00:06:00:00/40 tag 16 cdb 0x0 
data 4096 out
[ 2320.133007]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.133013] ata1.00: cmd 61/08:88:41:c0:d0/00:00:06:00:00/40 tag 17 cdb 0x0 
data 4096 out
[ 2320.133014]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.133019] ata1.00: cmd 61/08:90:b9:8c:d1/00:00:06:00:00/40 tag 18 cdb 0x0 
data 4096 out
[ 2320.133020]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.133025] ata1.00: cmd 61/01:98:1b:cf:ce/00:00:07:00:00/40 tag 19 cdb 0x0 
data 512 out
[ 2320.133027]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.133032] ata1.00: cmd 61/08:a0:31:cf:ce/00:00:07:00:00/40 tag 20 cdb 0x0 
data 4096 out
[ 2320.133033]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.133038] ata1.00: cmd 61/10:e0:41:af:8b/00:00:05:00:00/40 tag 28 cdb 0x0 
data 8192 out
[ 2320.133039]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.133044] ata1.00: cmd 61/01:e8:ba:12:8d/00:00:06:00:00/40 tag 29 cdb 0x0 
data 512 out
[ 2320.133046]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.133051] ata1.00: cmd 61/10:f0:c1:12:8d/00:00:06:00:00/40 tag 30 cdb 0x0 
data 8192 out
[ 2320.133052]  res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM 
violation)
[ 2320.378387] ata1: soft resetting port
[ 2320.442169] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 2320.460012] ata1.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 
156301488
[ 2320.461395] ata1.00: ata_hpa_resize 

Re: [PATCH] fix race in AF_UNIX

2007-06-05 Thread Miklos Szeredi
> > From: Miklos Szeredi <[EMAIL PROTECTED]>
> > Date: Mon, 04 Jun 2007 11:45:32 +0200
> > 
> > > > A recv() on an AF_UNIX, SOCK_STREAM socket can race with a
> > > > send()+close() on the peer, causing recv() to return zero, even though
> > > > the sent data should be received.
> > > > 
> > > > This happens if the send() and the close() is performed between
> > > > skb_dequeue() and checking sk->sk_shutdown in unix_stream_recvmsg():
> > > > 
> > > > process A  skb_dequeue() returns NULL, there's no data in the socket 
> > > > queue
> > > > process B  new data is inserted onto the queue by unix_stream_sendmsg()
> > > > process B  sk->sk_shutdown is set to SHUTDOWN_MASK by 
> > > > unix_release_sock()
> > > > process A  sk->sk_shutdown is checked, unix_release_sock() returns zero
> > > 
> > > This is only part of the story.  It turns out, there are other races
> > > involving the garbage collector, that can throw away perfectly good
> > > packets with AF_UNIX sockets in them.
> > > 
> > > The problems arise when a socket goes from installed to in-flight or
> > > vica versa during garbage collection.  Since gc is done with a
> > > spinlock held, this only shows up on SMP.
> > > 
> > > The following patch fixes it for me, but it's possibly the wrong
> > > approach.
> > > 
> > > Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>
> 
> Concerning this specific patch I think we need to rethink it
> a bit.
> 
> Holding a global mutex over recvmsg() calls under AF_UNIX is pretty
> much a non-starter, this will kill performance for multi-threaded
> apps.

That's an rwsem held for read.  It's held for write in unix_gc() only
for a short duration, and unix_gc() should only rarely be called.  So
I don't think there's any performance problem here.

> 
> One possible solution is for the garbage collection code to hold the
> u->readlock while processing a socket, but be careful about deadlocks.

That would have exactly the same effect.  Only the code would be more
complicated.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Yet another Uniwill laptop with ALC861 codec

2007-06-05 Thread Dave Jones
Rediffed version of the patch from ..
http://bugzilla.kernel.org/show_bug.cgi?id=8016
that seems to be lingering for some time.

Original patch by: Andy Shevchenko <[EMAIL PROTECTED]>
Signed-off-by: Dave Jones <[EMAIL PROTECTED]>

diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 4776de9..2e6193c 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -8771,6 +8771,7 @@ static struct snd_pci_quirk alc861_cfg_tbl[] = {
SND_PCI_QUIRK(0x1179, 0xff00, "Toshiba", ALC861_TOSHIBA),
SND_PCI_QUIRK(0x1179, 0xff10, "Toshiba", ALC861_TOSHIBA),
SND_PCI_QUIRK(0x1584, 0x9072, "Uniwill m31", ALC861_UNIWILL_M31),
+   SND_PCI_QUIRK(0x1584, 0x9075, "Uniwill", ALC861_UNIWILL_M31),
SND_PCI_QUIRK(0x1584, 0x2b01, "Uniwill X40AIx", ALC861_UNIWILL_M31),
SND_PCI_QUIRK(0x1849, 0x0660, "Asrock 939SLI32", ALC660_3ST),
SND_PCI_QUIRK(0x8086, 0xd600, "Intel", ALC861_3ST),


-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.22-rc3-mm1 remove bluetooth usb adapter caused kmalloc bug

2007-06-05 Thread Christoph Lameter
Note that the corruption seems to have its cause in a decrement done at
offset 16 into the object pointing to the refcount in struct hci_dev. So 
it looks like the refcount was decremented after the object was freed.

sysfs related?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch -mm] alsa mixer_oss kfree fix

2007-06-05 Thread young dave

Hi, Andrew



kfree(NULL) is legal, and is often used.



Apart from the null pointer, IMHO,there's two problem need to be
fixed, I'm not sure.

What's your opinion?

1. the label indent should be removed
2. similar code use different label, some use " __unlock" but others
use "__unalloc", "__unlock" seems to be misspelled.

Regards
dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation/CodingStyle: Add rules for goto labels

2007-06-05 Thread WANG Cong
On Tue, Jun 05, 2007 at 11:16:13AM +0200, Rene Herman wrote:
>On 06/05/2007 04:10 AM, WANG Cong wrote:
>
>>On Mon, Jun 04, 2007 at 01:57:51PM -0400, Jeff Garzik wrote:
>
>>>A matter of opinion :)  I tend to think goto is special enough to
>>>warrant column 1 unconditionally.  It is special, so it draws additional
>>>attention over and above case labels.
>>>
>>>I and others have been tripped up when programmers "hide" goto
>>>statements among regular statements.
>>>
>>>IMO goto warrants a big flashing "notice me" sign.
>
>>Hmmm, perhaps.
>>
>>So, it seems that we can reach an agreement. Any other comments or
>>suggestions?
>
>One more -- I absolutely agree with Jeff that goto should stand out as best 
>as possible but I think that's actually more so when they're indented 2 
>columns.
>
>Have been working on a legacy CD-ROM driver lately and puting "out:" labels 
>at 2 spaces started out as a personal style preference of Pekka Enberg (I 
>used to put them at 0) but has grown on me. It makes them clearly fall 
>inside the function, not being aligned at the same level as the next 
> function header, which makes for the "lowest effort visual scan" of all I 
>feel. One is just too little for that, more than 2 is too much...
>
>Here's the last version that was posted:
>
>http://lkml.org/lkml/2007/6/4/50
>
>It gets a little different visually when labels are mostly longer than a 
>simple "out" or "again", or "error" or something like that but if someone's 
>going to try to pin down the label style, I'd like the freedom to have two 
>spaces in front of them...
>
>Rene.

I see. Thank you for your advice. Freedom is a good thing. ;)

>
>P.S: Your message had a Mail-Followup-To set which dropped yourself and 
>turned the others from CCs into TOs. If you can help it, please no header 
>tricks.

Thanks! You are very kind. I will try to fix it!

Regards!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86_64: use NULL for pointer

2007-06-05 Thread Randy Dunlap
From: Randy Dunlap <[EMAIL PROTECTED]>

Use NULL instead of 0 for pointer:
arch/x86_64/kernel/vsyscall.c:183:21: warning: Using plain integer as NULL 
pointer

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/vsyscall.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.22-rc3-git7.orig/arch/x86_64/kernel/vsyscall.c
+++ linux-2.6.22-rc3-git7/arch/x86_64/kernel/vsyscall.c
@@ -180,7 +180,7 @@ time_t __vsyscall(1) vtime(time_t *t)
if (unlikely(!__vsyscall_gtod_data.sysctl_enabled))
return time_syscall(t);
 
-   vgettimeofday(, 0);
+   vgettimeofday(, NULL);
result = tv.tv_sec;
if (t)
*t = result;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] MTD: use NULL for pointer

2007-06-05 Thread Randy Dunlap
From: Randy Dunlap <[EMAIL PROTECTED]>

Use NULL instead of 0 for pointer:
drivers/mtd/chips/cfi_cmdset_0001.c:2258:43: warning: Using plain integer as 
NULL pointer

Other changes by inspection.

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---
 drivers/mtd/chips/cfi_cmdset_0001.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

--- linux-2.6.22-rc3-git7.orig/drivers/mtd/chips/cfi_cmdset_0001.c
+++ linux-2.6.22-rc3-git7/drivers/mtd/chips/cfi_cmdset_0001.c
@@ -1930,7 +1930,7 @@ static int cfi_intelext_lock(struct mtd_
printk(KERN_DEBUG "%s: lock status before, ofs=0x%08llx, len=0x%08X\n",
   __FUNCTION__, ofs, len);
cfi_varsize_frob(mtd, do_printlockstatus_oneblock,
-   ofs, len, 0);
+   ofs, len, NULL);
 #endif
 
ret = cfi_varsize_frob(mtd, do_xxlock_oneblock,
@@ -1940,7 +1940,7 @@ static int cfi_intelext_lock(struct mtd_
printk(KERN_DEBUG "%s: lock status after, ret=%d\n",
   __FUNCTION__, ret);
cfi_varsize_frob(mtd, do_printlockstatus_oneblock,
-   ofs, len, 0);
+   ofs, len, NULL);
 #endif
 
return ret;
@@ -1954,7 +1954,7 @@ static int cfi_intelext_unlock(struct mt
printk(KERN_DEBUG "%s: lock status before, ofs=0x%08llx, len=0x%08X\n",
   __FUNCTION__, ofs, len);
cfi_varsize_frob(mtd, do_printlockstatus_oneblock,
-   ofs, len, 0);
+   ofs, len, NULL);
 #endif
 
ret = cfi_varsize_frob(mtd, do_xxlock_oneblock,
@@ -1964,7 +1964,7 @@ static int cfi_intelext_unlock(struct mt
printk(KERN_DEBUG "%s: lock status after, ret=%d\n",
   __FUNCTION__, ret);
cfi_varsize_frob(mtd, do_printlockstatus_oneblock,
-   ofs, len, 0);
+   ofs, len, NULL);
 #endif
 
return ret;
@@ -2255,7 +2255,7 @@ static void cfi_intelext_save_locks(stru
adr = region->offset + block * len;
 
status = cfi_varsize_frob(mtd,
-   do_getlockstatus_oneblock, adr, len, 0);
+   do_getlockstatus_oneblock, adr, len, NULL);
if (status)
set_bit(block, region->lockmap);
else
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: signalfd API issues (was Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes)

2007-06-05 Thread Davide Libenzi
On Tue, 5 Jun 2007, Linus Torvalds wrote:

> On Tue, 5 Jun 2007, Davide Libenzi wrote:
> > On Wed, 6 Jun 2007, Benjamin Herrenschmidt wrote:
> > > 
> > > Yeah, synchronous signals should probably never be delivered to another
> > > process, even via signalfd. There's no point delivering a SEGV to
> > > somebody else :-)
> > 
> > That'd be a limitation. Like you can choose to not handle SEGV, you can 
> > choose to have a signalfd listening to it. Of course, not with the 
> > intention to *handle* the signal, but with a notification intent.
> 
> I agree that it would be a limitation, but it would be a sane one.
> 
> How about we try to live with that limitation, if only to avoid the issue 
> of having the private signals being stolen by anybody else. If we actually 
> find a real-live use-case where that is bad in the future, we can re-visit 
> the issue - it's always easier to _expand_ semantics later than it is to 
> restrict them, so I think this thread is a good argument for starting it 
> out in a more restricted form before people start depending on semantics 
> that can be nasty..

Yeah, that's easy. We can exclude them at signalfd creation time.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC] Extend Linux to support proportional-share scheduling

2007-06-05 Thread Li, Tong N
Willy,

These are all good comments. Regarding the cache penalty, I've done some
measurements using benchmarks like SPEC OMP on an 8-processor SMP and
the performance with this patch was nearly identical to that with the
mainline. I'm sure some apps may suffer from the potentially more
migrations with this design. In the end, I think what we want is to
balance fairness and performance. This design currently emphasizes on
fairness, but it could be changed to relax fairness when performance
does become an issue (which could even be a user-tunable knob depending
on which aspect the user cares more).

Thanks,

  tong

> -Original Message-
> From: Willy Tarreau [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, June 05, 2007 8:33 PM
> To: Li, Tong N
> Cc: linux-kernel@vger.kernel.org; Ingo Molnar; Con Kolivas; Linus
Torvalds;
> Arjan van de Ven; Siddha, Suresh B; Barnes, Jesse; William Lee Irwin
III;
> Bill Huey (hui); [EMAIL PROTECTED]; [EMAIL PROTECTED]; Nick Piggin;
Bill
> Davidsen; John Kingman; Peter Williams; [EMAIL PROTECTED]
> Subject: Re: [RFC] Extend Linux to support proportional-share
scheduling
> 
> Hi Tong,
> 
> On Tue, Jun 05, 2007 at 06:56:17PM -0700, Li, Tong N wrote:
> > Hi all,
> >
> > I've ported my code to mainline 2.6.21.3. You can get it at
> > http://www.cs.duke.edu/~tongli/linux/.
> 
> as much as possible, you should post your patch for others to comment
> on it. Posting just a URL is often fine to inform people that there's
> an update to *try*, but at this stage, it may be more important to
> comment on your design and code than trying it.
> 
> [...]
> 
> > Trio has two unique features: (1) it enables users to control shares
of
> > CPU time for any thread or group of threads (e.g., a process, an
> > application, etc.), and (2) it enables fair sharing of CPU time
across
> > multiple CPUs. For example, with ten tasks running on eight CPUs,
Trio
> > allows each task to take an equal fraction of the total CPU time,
> 
> While this looks interesting, doesn't it make threads jump to random
> CPUs all the time, thus reducing cache efficiency ? Or maybe it would
> be good to consider two or three criteria to group CPUs :
>   - those which share the same caches (multi-core)
>   - those which share the same local memory on the same mainboard
> (multi-socket)
>   - those which are so far away from each others that it's really
> not worth migrating a task
> 
> > whereas no existing scheduler achieves such fairness. These features
> > enable Trio to complement the mainline scheduler and other proposals
> > such as CFS and SD to enable greater user flexibility and stronger
> > fairness.
> 
> Right now, I think that only benchmarks could tell which design is
> better. I understand that running 10 tasks on 8 CPUs may result in
> the last batch involving only 2 CPUs with 1 task each, thus increasing
> the overall wall time. But maybe cache thrashing between CPUs will
> also increase the wall time.
> 
> Regards,
> Willy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.22-rc3-mm1 remove bluetooth usb adapter caused kmalloc bug

2007-06-05 Thread Andrew Morton
On Wed, 6 Jun 2007 03:27:31 + "young dave" <[EMAIL PROTECTED]> wrote:

> Hi,
> > Are you able to reproduce this in 2.6.22-rc4?
> 
> The bug seems doesn't exist in 2.6.22-rc4, I have tested it, the
> unpluging can't produce kernel bug message.
> 

OK, thanks.  I'll drop bluetooth-postpone-hci_dev-unregistration.patch -
let's see if that helps.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: signalfd API issues (was Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes)

2007-06-05 Thread Benjamin Herrenschmidt

> a) Process-global signals can be read by any thread (inside or outside
> of the process receiving the signal).
> 
> Rationale:
>   This should always work, so there's no reason to limit it.

I agree, with an appropriate fix to recalc_sigpending_tsk() to only
clear TIF_SIGPENDING if tsk == current (the patch Linus posted
basically) _along_ with a fix to avoid the notifier thingy if stealing
from another task, that would work.

> b) Thread-specific signals can only be read by their target thread.
> 
> Rationale:
>   This behavior is required by POSIX, and if an application is using
> pthread_kill()/tkill()/tgkill()/etc. to specifically direct a signal, it
> damn well better get to where the app wants it to go.

I agree there too. I don't see the point of the 'feature' of allowing
those to be stolen and can only lead into all sort of new headaches
nobody needs.

> c) Synchronous signals ("Naturally" generated SIGILL, SIGFPE, SIGSEGV,
> SIGBUS, and SIGTRAP. Did I miss any?) are not delivered via signalfd()
> at all. (And by "naturally" generated, I mean signals that would have
> the SI_KERNEL flag set.)

Heh, well, as you say later, it can't be delivered anyway... I don't
think we need to do anything explicit to prevent them from being read()
in signalfd, it will just not happen.

> Rationale: 
>   These are a subset of thread-specific signals, so they can only be read
> from a signalfd by their target thread.
> 
> However, there's no way for the target thread to get the signal because
> it is either:
> 
> a) not blocked in a syscall waiting for signal delivery and thus further
> execution beyond the instruction causing the signal is impossible
>  OR
> b) it is blocked in a syscall waiting for signal delivery and the error
> is caused by the signal delivery mechanism itself (i.e. a bad pointer
> passed to read/select/poll/epoll_wait/etc.) and thus the signal can't be
> delivered

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86_64: remove extra extern declaring about dmi_ioremap

2007-06-05 Thread Yinghai Lu

[PATCH] x86_64: remove extra extern declaring about dmi_ioremap

Signed-off-by: Yinghai Lu <[EMAIL PROTECTED]>

diff --git a/include/asm-x86_64/dmi.h b/include/asm-x86_64/dmi.h
index 93b2b15..fc7b576 100644
--- a/include/asm-x86_64/dmi.h
+++ b/include/asm-x86_64/dmi.h
@@ -3,15 +3,12 @@
 
 #include 
 
-extern void *dmi_ioremap(unsigned long addr, unsigned long size);
-extern void dmi_iounmap(void *addr, unsigned long size);
-
 #define DMI_MAX_DATA 2048
 
 extern int dmi_alloc_index;
 extern char dmi_alloc_data[DMI_MAX_DATA];
 
-/* This is so early that there is no good way to allocate dynamic memory. 
+/* This is so early that there is no good way to allocate dynamic memory.
Allocate data in an BSS array. */
 static inline void *dmi_alloc(unsigned len)
 {


Re: [PATCH] x86_64: change dm_ioremap to ioremap

2007-06-05 Thread Yinghai Lu

On 6/5/07, Andi Kleen <[EMAIL PROTECTED]> wrote:

init_memory_mapping is not enough; it also needs a working
page allocator. But at this point there is only bootmem.
I don't think the patch is correct.


you are right  it failed with one 2G system.

YH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: signalfd API issues (was Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes)

2007-06-05 Thread Nicholas Miell
On Tue, 2007-06-05 at 20:37 -0700, Linus Torvalds wrote:
> 
> On Tue, 5 Jun 2007, Davide Libenzi wrote:
> > On Wed, 6 Jun 2007, Benjamin Herrenschmidt wrote:
> > > 
> > > Yeah, synchronous signals should probably never be delivered to another
> > > process, even via signalfd. There's no point delivering a SEGV to
> > > somebody else :-)
> > 
> > That'd be a limitation. Like you can choose to not handle SEGV, you can 
> > choose to have a signalfd listening to it. Of course, not with the 
> > intention to *handle* the signal, but with a notification intent.
> 
> I agree that it would be a limitation, but it would be a sane one.
> 
> How about we try to live with that limitation, if only to avoid the issue 
> of having the private signals being stolen by anybody else. If we actually 
> find a real-live use-case where that is bad in the future, we can re-visit 
> the issue - it's always easier to _expand_ semantics later than it is to 
> restrict them, so I think this thread is a good argument for starting it 
> out in a more restricted form before people start depending on semantics 
> that can be nasty..
> 
>   Linus

Proposed semantics:

a) Process-global signals can be read by any thread (inside or outside
of the process receiving the signal).

Rationale:
This should always work, so there's no reason to limit it.

b) Thread-specific signals can only be read by their target thread.

Rationale:
This behavior is required by POSIX, and if an application is using
pthread_kill()/tkill()/tgkill()/etc. to specifically direct a signal, it
damn well better get to where the app wants it to go.

c) Synchronous signals ("Naturally" generated SIGILL, SIGFPE, SIGSEGV,
SIGBUS, and SIGTRAP. Did I miss any?) are not delivered via signalfd()
at all. (And by "naturally" generated, I mean signals that would have
the SI_KERNEL flag set.)

Rationale: 
These are a subset of thread-specific signals, so they can only be read
from a signalfd by their target thread.

However, there's no way for the target thread to get the signal because
it is either:

a) not blocked in a syscall waiting for signal delivery and thus further
execution beyond the instruction causing the signal is impossible
 OR
b) it is blocked in a syscall waiting for signal delivery and the error
is caused by the signal delivery mechanism itself (i.e. a bad pointer
passed to read/select/poll/epoll_wait/etc.) and thus the signal can't be
delivered

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: signalfd API issues (was Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes)

2007-06-05 Thread Benjamin Herrenschmidt
> That'd be a limitation. Like you can choose to not handle SEGV, you can 
> choose to have a signalfd listening to it. Of course, not with the 
> intention to *handle* the signal, but with a notification intent.

Hrm.. either you handle it or you are dead ... I fail to see how
signalfd can sneak in to catch it just at the right time...

> > I'm actually thinking we shoud -also- only handle shared signals in
> > dequeue_signal() when called from a different task.
> 
> Why do you want to impose this? signalfd is a "sniffer", and the user 
> controls what it can dequeue/sniff or what not. I don't see a reason of 
> imposing such limits, unless there're clear technical issues.

Well, a synchronous signal such a SIGSEGV, SIGILL or SIGFPE generally
means that execution cannot continue unless the signal handler does
something about it...

I think you are opening a whole can of worms here.

> > Well.. you certainly need to instanciate a signalfd for every thread in
> > the process if you want to get shared signals for sure.
> 
> Why? Or better, what do you mean for "instanciate"?

Well, because the kernel makes the decision of which thread to target
the signal for a shared signal at emission time (though it -can- be
caught by another thread).

/me reads more code to be sure..

Oh well, a read from signalfd created on one thread -will- catch any
shared signal that was already pending, whatever thread the kernel
decided to target it at, it seems (that is, whatever thread actually got
TIF_SIGPENDING set), but will only catch private signals for _that_
thread (and I still think catching private signals is a wrong thing).

However, I'm not sure about the wakeup condition. signalfd_deliver will
wakeup anybody in the signalfd_list, which is -not- whoever is blocked
in signalfd_read() unless I'm missing something.

For your scheme to work, signalfd_read() you probably need to keep a
separate list of people to be nofified of signals and add current to it
from signalfd_list().

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: signalfd API issues (was Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes)

2007-06-05 Thread Linus Torvalds


On Tue, 5 Jun 2007, Davide Libenzi wrote:
> On Wed, 6 Jun 2007, Benjamin Herrenschmidt wrote:
> > 
> > Yeah, synchronous signals should probably never be delivered to another
> > process, even via signalfd. There's no point delivering a SEGV to
> > somebody else :-)
> 
> That'd be a limitation. Like you can choose to not handle SEGV, you can 
> choose to have a signalfd listening to it. Of course, not with the 
> intention to *handle* the signal, but with a notification intent.

I agree that it would be a limitation, but it would be a sane one.

How about we try to live with that limitation, if only to avoid the issue 
of having the private signals being stolen by anybody else. If we actually 
find a real-live use-case where that is bad in the future, we can re-visit 
the issue - it's always easier to _expand_ semantics later than it is to 
restrict them, so I think this thread is a good argument for starting it 
out in a more restricted form before people start depending on semantics 
that can be nasty..

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Extend Linux to support proportional-share scheduling

2007-06-05 Thread Willy Tarreau
Hi Tong,

On Tue, Jun 05, 2007 at 06:56:17PM -0700, Li, Tong N wrote:
> Hi all,
> 
> I've ported my code to mainline 2.6.21.3. You can get it at
> http://www.cs.duke.edu/~tongli/linux/.

as much as possible, you should post your patch for others to comment
on it. Posting just a URL is often fine to inform people that there's
an update to *try*, but at this stage, it may be more important to
comment on your design and code than trying it.

[...]

> Trio has two unique features: (1) it enables users to control shares of
> CPU time for any thread or group of threads (e.g., a process, an
> application, etc.), and (2) it enables fair sharing of CPU time across
> multiple CPUs. For example, with ten tasks running on eight CPUs, Trio
> allows each task to take an equal fraction of the total CPU time,

While this looks interesting, doesn't it make threads jump to random
CPUs all the time, thus reducing cache efficiency ? Or maybe it would
be good to consider two or three criteria to group CPUs :
  - those which share the same caches (multi-core)
  - those which share the same local memory on the same mainboard
(multi-socket)
  - those which are so far away from each others that it's really
not worth migrating a task

> whereas no existing scheduler achieves such fairness. These features
> enable Trio to complement the mainline scheduler and other proposals
> such as CFS and SD to enable greater user flexibility and stronger
> fairness.

Right now, I think that only benchmarks could tell which design is
better. I understand that running 10 tasks on 8 CPUs may result in
the last batch involving only 2 CPUs with 1 task each, thus increasing
the overall wall time. But maybe cache thrashing between CPUs will
also increase the wall time.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: signalfd API issues (was Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes)

2007-06-05 Thread Davide Libenzi
On Wed, 6 Jun 2007, Benjamin Herrenschmidt wrote:

> On Tue, 2007-06-05 at 17:58 -0700, Nicholas Miell wrote:
> > 
> > "At the time of generation, a determination shall be made whether the
> > signal has been generated for the process or for a specific thread
> > within the process. Signals which are generated by some action
> > attributable to a particular thread, such as a hardware fault, shall
> > be generated for the thread that caused the signal to be generated.
> 
> Yeah, synchronous signals should probably never be delivered to another
> process, even via signalfd. There's no point delivering a SEGV to
> somebody else :-)

That'd be a limitation. Like you can choose to not handle SEGV, you can 
choose to have a signalfd listening to it. Of course, not with the 
intention to *handle* the signal, but with a notification intent.



> I'm actually thinking we shoud -also- only handle shared signals in
> dequeue_signal() when called from a different task.

Why do you want to impose this? signalfd is a "sniffer", and the user 
controls what it can dequeue/sniff or what not. I don't see a reason of 
imposing such limits, unless there're clear technical issues.



> > dequeue_signal(tsk, ...) looks for signals first in tsk->pending and
> > then in tsk->signal->shared_pending.
> > 
> > sys_signalfd() stores current in signalfd_ctx. signalfd_read() passes
> > that context to signalfd_dequeue, which passes that that saved
> > task_struct pointer to dequeue_signal.
> > 
> > This means that a signalfd will deliver signals targeted towards
> > either the original thread that created that signalfd, or signals
> > targeted towards the process as a whole.
> >
> > This means that a single signalfd is not adequate to handle signal
> > delivery for all threads in a process, because signals targeted
> > towards threads other than the thread that originally created the
> > signalfd will never be queued to that signalfd.
> 
> Well.. you certainly need to instanciate a signalfd for every thread in
> the process if you want to get shared signals for sure.

Why? Or better, what do you mean for "instanciate"?



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] cpusets: do not allow TIF_MEMDIE tasks to allocate globally

2007-06-05 Thread David Rientjes
On Tue, 5 Jun 2007, Christoph Lameter wrote:

> H... But we have sent it a SIGKILL. If the process is following 
> conventions then it is exiting. Of course the process could be abusing the 
> system and attempting to OOM the whole system as an act of revenge for 
> being killed but isnt this a bit far fetched?
> 

It's not abusing the system because it was killed, it was killed because 
it was abusing the system and attempted to mlock more memory than allowed 
in its exclusive mems.  So we OOM the task but it can continue mlock'ing 
memory and intrude on the memory of other cpusets because of the 
TIF_MEMDIE exception and our SIGKILL hasn't been handled yet.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.22-rc3-mm1 remove bluetooth usb adapter caused kmalloc bug

2007-06-05 Thread young dave

Hi,

Are you able to reproduce this in 2.6.22-rc4?


The bug seems doesn't exist in 2.6.22-rc4, I have tested it, the
unpluging can't produce kernel bug message.

Regards
dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [AGPGART] intel_agp: Add support for G33, Q33 and Q35 chipsets

2007-06-05 Thread Wang Zhenyu
On 2007.06.06 09:01:57 +, Wang Zhenyu wrote:
> 
> This patch adds pci ids for G33, Q33 and Q35 chips
> 
> It bases on below intel-agp patches currently in -mm tree:
> intel_agp-cleanup-intel-private-data.patch
> intel_agp-cleanup-intel-private-data-update.patch
> intel_agp-use-table-for-device-probe.patch
> intel_agp-use-table-for-device-probe-update.patch
> intel_agp-add-support-for-965gme-gle.patch
> intel_agp-add-support-for-945gme.patch
> 

oops, found one break line was missed in detect GTT size switch. 
Here's the updated patch. Thanks Kent Liu for report. 

[AGPGART] intel_agp: Add support for G33, Q33 and Q35 chipsets

This patch adds pci ids for G33, Q33 and Q35 chips, and update with new
GTT size and stolen mem size detect method on these chips.

Signed-off-by: Wang Zhenyu <[EMAIL PROTECTED]>
---
 drivers/char/agp/agp.h   |6 +++-
 drivers/char/agp/intel-agp.c |   79 -
 2 files changed, 82 insertions(+), 3 deletions(-)

diff --git a/drivers/char/agp/agp.h b/drivers/char/agp/agp.h
index fdbca25..35ab1a9 100644
--- a/drivers/char/agp/agp.h
+++ b/drivers/char/agp/agp.h
@@ -176,7 +176,7 @@ struct agp_bridge_data {
 #define I830_GMCH_MEM_MASK 0x1
 #define I830_GMCH_MEM_64M  0x1
 #define I830_GMCH_MEM_128M 0
-#define I830_GMCH_GMS_MASK 0x70
+#define I830_GMCH_GMS_MASK 0xF0
 #define I830_GMCH_GMS_DISABLED 0x00
 #define I830_GMCH_GMS_LOCAL0x10
 #define I830_GMCH_GMS_STOLEN_512   0x20
@@ -231,6 +231,10 @@ struct agp_bridge_data {
 #define I965_PGETBL_SIZE_512KB (0 << 1)
 #define I965_PGETBL_SIZE_256KB (1 << 1)
 #define I965_PGETBL_SIZE_128KB (2 << 1)
+#define G33_PGETBL_SIZE_MASK(3 << 8)
+#define G33_PGETBL_SIZE_1M  (1 << 8)
+#define G33_PGETBL_SIZE_2M  (2 << 8)
+
 #define I810_DRAM_CTL  0x3000
 #define I810_DRAM_ROW_00x0001
 #define I810_DRAM_ROW_0_SDRAM  0x0001
diff --git a/drivers/char/agp/intel-agp.c b/drivers/char/agp/intel-agp.c
index 3c4a1c2..d383168 100644
--- a/drivers/char/agp/intel-agp.c
+++ b/drivers/char/agp/intel-agp.c
@@ -22,6 +22,12 @@
 #define PCI_DEVICE_ID_INTEL_82965GM_IG  0x2A02
 #define PCI_DEVICE_ID_INTEL_82965GME_IG 0x2A12
 #define PCI_DEVICE_ID_INTEL_82945GME_IG 0x27AE
+#define PCI_DEVICE_ID_INTEL_G33_HB  0x29C0
+#define PCI_DEVICE_ID_INTEL_G33_IG  0x29C2
+#define PCI_DEVICE_ID_INTEL_Q35_HB  0x29B0
+#define PCI_DEVICE_ID_INTEL_Q35_IG  0x29B2
+#define PCI_DEVICE_ID_INTEL_Q33_HB  0x29D0
+#define PCI_DEVICE_ID_INTEL_Q33_IG  0x29D2
 
 #define IS_I965 (agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82946GZ_HB || \
  agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82965G_1_HB || 
\
@@ -29,6 +35,9 @@
  agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82965G_HB || \
  agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82965GM_HB)
 
+#define IS_G33 (agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_G33_HB || \
+   agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_Q35_HB || \
+   agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_Q33_HB)
 
 extern int agp_memory_reserved;
 
@@ -55,6 +64,8 @@ extern int agp_memory_reserved;
 #define I915_PTEADDR   0x1C
 #define I915_GMCH_GMS_STOLEN_48M   (0x6 << 4)
 #define I915_GMCH_GMS_STOLEN_64M   (0x7 << 4)
+#define G33_GMCH_GMS_STOLEN_128M   (0x8 << 4)
+#define G33_GMCH_GMS_STOLEN_256M   (0x9 << 4)
 
 /* Intel 965G registers */
 #define I965_MSAC 0x62
@@ -448,6 +459,22 @@ static void intel_i830_init_gtt_entries(void)
size = 512;
}
size += 4; /* add in BIOS popup space */
+   } else if (IS_G33) {
+   /* G33's GTT size defined in gmch_ctrl */
+   switch (gmch_ctrl & G33_PGETBL_SIZE_MASK) {
+   case G33_PGETBL_SIZE_1M:
+   size = 1024;
+   break;
+   case G33_PGETBL_SIZE_2M:
+   size = 2048;
+   break;
+   default:
+   printk(KERN_INFO PFX "Unknown page table size 0x%x, "
+   "assuming 512KB\n",
+   (gmch_ctrl & G33_PGETBL_SIZE_MASK));
+   size = 512;
+   }
+   size += 4;
} else {
/* On previous hardware, the GTT size was just what was
 * required to map the aperture.
@@ -499,7 +526,8 @@ static void intel_i830_init_gtt_entries(void)
if (agp_bridge->dev->device == 
PCI_DEVICE_ID_INTEL_82915G_HB ||
agp_bridge->dev->device == 
PCI_DEVICE_ID_INTEL_82915GM_HB ||
agp_bridge->dev->device == 
PCI_DEVICE_ID_INTEL_82945G_HB ||
-   agp_bridge->dev->device == 
PCI_DEVICE_ID_INTEL_82945GM_HB || IS_I965 )
+ 

Re: signalfd API issues (was Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes)

2007-06-05 Thread Benjamin Herrenschmidt
On Tue, 2007-06-05 at 17:58 -0700, Nicholas Miell wrote:
> 
> "At the time of generation, a determination shall be made whether the
> signal has been generated for the process or for a specific thread
> within the process. Signals which are generated by some action
> attributable to a particular thread, such as a hardware fault, shall
> be generated for the thread that caused the signal to be generated.

Yeah, synchronous signals should probably never be delivered to another
process, even via signalfd. There's no point delivering a SEGV to
somebody else :-)

I'm actually thinking we shoud -also- only handle shared signals in
dequeue_signal() when called from a different task.

> dequeue_signal(tsk, ...) looks for signals first in tsk->pending and
> then in tsk->signal->shared_pending.
> 
> sys_signalfd() stores current in signalfd_ctx. signalfd_read() passes
> that context to signalfd_dequeue, which passes that that saved
> task_struct pointer to dequeue_signal.
> 
> This means that a signalfd will deliver signals targeted towards
> either the original thread that created that signalfd, or signals
> targeted towards the process as a whole.
>
> This means that a single signalfd is not adequate to handle signal
> delivery for all threads in a process, because signals targeted
> towards threads other than the thread that originally created the
> signalfd will never be queued to that signalfd.

Well.. you certainly need to instanciate a signalfd for every thread in
the process if you want to get shared signals for sure.

BTW. Not directly related, but that notifier thing ... it looks really
really dodgy. It's also only ever used by the DRM. Somebody around knows
why that's in and why the DRM cannot just use normal signal blocking
techniques ?

Cheers,
Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.22-rc3-mm1 remove bluetooth usb adapter caused kmalloc bug

2007-06-05 Thread young dave

Hi,

Are you able to reproduce this in 2.6.22-rc4?

The kmalloc in dmesg is in skbuff.c:pskb_expand_head, I will try
2.6.22-rc4 ASAP.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] neaten lguest boot code (again)

2007-06-05 Thread Andrew Morton
On Wed, 06 Jun 2007 12:37:39 +1000 Rusty Russell <[EMAIL PROTECTED]> wrote:

> (This cleanup seems to have gotten lost in rc3-mm1?  It was in rc2-mm1
> called lguest-the-host-code-update-for-mm-simplify-boot_params.patch)
> 
> Andrew patched up lguest after the boot parameters became a proper
> structure, but in fact it can be considerably neatened.
> 
> Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
> ---
>  drivers/lguest/lguest.c |3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> ===
> --- a/drivers/lguest/lguest.c
> +++ b/drivers/lguest/lguest.c
> @@ -444,8 +444,7 @@ __init void lguest_init(void *boot)
>  {
>   /* Copy boot parameters first. */
>   memcpy(_params, boot, PARAM_SIZE);
> - memcpy(boot_command_line,
> -__va(*(unsigned long *)((void *)_params + NEW_CL_POINTER)),
> + memcpy(boot_command_line, __va(boot_params.hdr.cmd_line_ptr),
>  COMMAND_LINE_SIZE);
>  
>   paravirt_ops.name = "lguest";

I dropped it because it depended upon mm-simplify-boot_params.patch which
for some reason disappeared but now appears to have reappeared.

Shall resurrect.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.22-rc3-mm1 remove bluetooth usb adapter caused kmalloc bug

2007-06-05 Thread Andrew Morton
On Wed, 6 Jun 2007 01:56:01 + "young dave" <[EMAIL PROTECTED]> wrote:

> Hi,
> when I remove the usb bluetooth adapter , the kernel reporting bug:
> 
> /* this two line is printk message I printed in net/bluetooth/hci_core.c */
> 
> #before free dev: c3758430
> #after free dev
> 
> =
> BUG kmalloc-1024: Poison overwritten
> -
> 
> INFO: 0xc3758440-0xc3758440. First byte 0x6a instead of 0x6b
> INFO: Allocated in hci_alloc_dev+0x1f/0x80 [bluetooth] age=6094 cpu=0 pid=9586
> INFO: Freed in device_release+0x82/0x90 age=0 cpu=0 pid=7
> INFO: Slab 0xc106eb00 used=6 fp=0xc3758430 flags=0x400020c3
> INFO: Object 0xc3758430 @offset=1072 fp=0xc375b240

I don't get it.  device_release() doesn't call kfree() or kmem_cache_free()
or any such thing.

> Bytes b4 0xc3758420:  00 00 00 00 b9 ea 09 00 5a 5a 5a 5a 5a 5a 5a 5a
> ¹ê..
>   Object 0xc3758430:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> 
>   Object 0xc3758440:  6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> jkkk
>   Object 0xc3758450:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> 
>   Object 0xc3758460:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> 
>   Object 0xc3758470:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> 
>   Object 0xc3758480:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> 
>   Object 0xc3758490:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> 
>   Object 0xc37584a0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> 
>  Redzone 0xc3758830:  bb bb bb bb
> 
>  Padding 0xc3758858:  5a 5a 5a 5a 5a 5a 5a 5a
> 
>  [] check_bytes_and_report+0xaa/0xe0
>  [] check_object+0x198/0x1e0
>  [] alloc_debug_processing+0x9c/0x130
>  [] __slab_alloc+0x10a/0x220
>  [] pskb_expand_head+0x4a/0x140
>  [] __kmalloc+0x72/0x80
>  [] pskb_expand_head+0x4a/0x140
>  [] pskb_expand_head+0x4a/0x140
>  [] alloc_debug_processing+0xc6/0x130
>  [] netlink_broadcast+0x68/0x370
>  [] kobject_uevent_env+0x32d/0x4e0
>  [] kobject_uevent_env+0x414/0x4e0
>  [] d_kill+0x3f/0x60
>  [] dput+0x1a/0xf0
>  [] device_del+0x1ac/0x2e0
>  [] usb_disable_device+0x78/0xf0
>  [] usb_disconnect+0x93/0xf0
>  [] hub_port_connect_change+0x2f2/0x3b0
>  [] hub_events+0x212/0x420
>  [] autoremove_wake_function+0x0/0x50
>  [] hub_thread+0x25/0x110
>  [] autoremove_wake_function+0x0/0x50
>  [] autoremove_wake_function+0x0/0x50
>  [] hub_thread+0x0/0x110
>  [] kthread+0x59/0xa0
>  [] kthread+0x0/0xa0
>  [] kernel_thread_helper+0x7/0x14
>  ===
> FIX kmalloc-1024: Restoring 0xc3758440-0xc3758440=0x6b
> 
> FIX kmalloc-1024: Marking all objects used

Could perhaps be due to bluetooth-postpone-hci_dev-unregistration.patch,
but I don't see how.  (But that patch looks a bit dodgy wrt module unload
so I think I'll drop it).

Are you able to reproduce this in 2.6.22-rc4?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] neaten lguest boot code (again)

2007-06-05 Thread Rusty Russell
(This cleanup seems to have gotten lost in rc3-mm1?  It was in rc2-mm1
called lguest-the-host-code-update-for-mm-simplify-boot_params.patch)

Andrew patched up lguest after the boot parameters became a proper
structure, but in fact it can be considerably neatened.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 drivers/lguest/lguest.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

===
--- a/drivers/lguest/lguest.c
+++ b/drivers/lguest/lguest.c
@@ -444,8 +444,7 @@ __init void lguest_init(void *boot)
 {
/* Copy boot parameters first. */
memcpy(_params, boot, PARAM_SIZE);
-   memcpy(boot_command_line,
-  __va(*(unsigned long *)((void *)_params + NEW_CL_POINTER)),
+   memcpy(boot_command_line, __va(boot_params.hdr.cmd_line_ptr),
   COMMAND_LINE_SIZE);
 
paravirt_ops.name = "lguest";


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -rt] Fix TASKLET_STATE_SCHED WARN_ON()

2007-06-05 Thread john stultz
Hey Ingo,
So we've been seeing the following trace fairly frequently on our SMP
boxes when running kernbench:

BUG: at kernel/softirq.c:639 __tasklet_action()

Call Trace:
 [] dump_trace+0xaa/0x32a
 [] show_trace+0x41/0x5c
 [] dump_stack+0x15/0x17
 [] __tasklet_action+0xdf/0x12e
 [] tasklet_action+0x27/0x29
 [] ksoftirqd+0x16c/0x271
 [] kthread+0xf5/0x128
 [] child_rip+0xa/0x12


Paul also pointed this out awhile back: http://lkml.org/lkml/2007/2/25/1


Anyway, I think I finally found the issue. Its a bit hard to explain,
but the idea is while __tasklet_action is running the tasklet function
on CPU1, if a call to tasklet_schedule() on CPU2 is made, and if right
after we mark the TASKLET_STATE_SCHED bit we are preempted,
__tasklet_action on CPU1 might be able to re-run the function, clear the
bit and unlock the tasklet before CPU2 enters __tasklet_common_schedule.
Once __tasklet_common_schedule locks the tasklet, we will add the
tasklet to the list with the TASKLET_STATE_SCHED *unset*. 

I've verified this race occurs w/ a WARN_ON in
__tasklet_common_schedule().


This fix avoids this race by making sure *after* we've locked the
tasklet that the STATE_SCHED bit is set before adding it to the list.

Does it look ok to you?

thanks
-john

Signed-off-by: John Stultz <[EMAIL PROTECTED]>

Index: 2.6-rt/kernel/softirq.c
===
--- 2.6-rt.orig/kernel/softirq.c2007-06-05 18:30:54.0 -0700
+++ 2.6-rt/kernel/softirq.c 2007-06-05 18:36:44.0 -0700
@@ -544,10 +544,17 @@ static void inline
 __tasklet_common_schedule(struct tasklet_struct *t, struct tasklet_head *head, 
unsigned int nr)
 {
if (tasklet_trylock(t)) {
-   WARN_ON(t->next != NULL);
-   t->next = head->list;
-   head->list = t;
-   raise_softirq_irqoff(nr);
+   /* We may have been preempted before tasklet_trylock
+* and __tasklet_action may have already run.
+* So double check the sched bit while the takslet
+* is locked before adding it to the list.
+*/
+   if (test_bit(TASKLET_STATE_SCHED, >state)) {
+   WARN_ON(t->next != NULL);
+   t->next = head->list;
+   head->list = t;
+   raise_softirq_irqoff(nr);
+   }
tasklet_unlock(t);
}
 }


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] Extend Linux to support proportional-share scheduling

2007-06-05 Thread Li, Tong N
Hi all,

I've ported my code to mainline 2.6.21.3. You can get it at
http://www.cs.duke.edu/~tongli/linux/. As I said before, the intent of
the patch is not to compete with CFS and SD because the design relies on
the underlying scheduler for interactive performance. The goal here is
to present a complementary design that can ensure stronger MP fairness,
which I think is lacking in the existing proposals. Here's a brief
overview of the design (I call it Trio for the lack of a better name).
Any comments or suggestions will be highly appreciated.

Trio extends the existing Linux scheduler with support for
proportional-share scheduling. It uses a scheduling algorithm, called
Distributed Weighted Round-Robin (DWRR), which retains the existing
scheduler design as much as possible, and extends it to achieve
proportional fairness with O(1) time complexity and a constant error
bound, compared to the ideal fair scheduling algorithm. The goal of Trio
is not to improve interactive performance; rather, it relies on the
existing scheduler for interactivity and extends it to support MP
proportional fairness.

Trio has two unique features: (1) it enables users to control shares of
CPU time for any thread or group of threads (e.g., a process, an
application, etc.), and (2) it enables fair sharing of CPU time across
multiple CPUs. For example, with ten tasks running on eight CPUs, Trio
allows each task to take an equal fraction of the total CPU time,
whereas no existing scheduler achieves such fairness. These features
enable Trio to complement the mainline scheduler and other proposals
such as CFS and SD to enable greater user flexibility and stronger
fairness.

  tong
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] 2.6.22-rc3-mm1 remove bluetooth usb adapter caused kmalloc bug

2007-06-05 Thread young dave

Hi,
when I remove the usb bluetooth adapter , the kernel reporting bug:

/* this two line is printk message I printed in net/bluetooth/hci_core.c */

#before free dev: c3758430
#after free dev

=
BUG kmalloc-1024: Poison overwritten
-

INFO: 0xc3758440-0xc3758440. First byte 0x6a instead of 0x6b
INFO: Allocated in hci_alloc_dev+0x1f/0x80 [bluetooth] age=6094 cpu=0 pid=9586
INFO: Freed in device_release+0x82/0x90 age=0 cpu=0 pid=7
INFO: Slab 0xc106eb00 used=6 fp=0xc3758430 flags=0x400020c3
INFO: Object 0xc3758430 @offset=1072 fp=0xc375b240

Bytes b4 0xc3758420:  00 00 00 00 b9 ea 09 00 5a 5a 5a 5a 5a 5a 5a 5a
¹ê..
 Object 0xc3758430:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

 Object 0xc3758440:  6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
jkkk
 Object 0xc3758450:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

 Object 0xc3758460:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

 Object 0xc3758470:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

 Object 0xc3758480:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

 Object 0xc3758490:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

 Object 0xc37584a0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

Redzone 0xc3758830:  bb bb bb bb

Padding 0xc3758858:  5a 5a 5a 5a 5a 5a 5a 5a

[] check_bytes_and_report+0xaa/0xe0
[] check_object+0x198/0x1e0
[] alloc_debug_processing+0x9c/0x130
[] __slab_alloc+0x10a/0x220
[] pskb_expand_head+0x4a/0x140
[] __kmalloc+0x72/0x80
[] pskb_expand_head+0x4a/0x140
[] pskb_expand_head+0x4a/0x140
[] alloc_debug_processing+0xc6/0x130
[] netlink_broadcast+0x68/0x370
[] kobject_uevent_env+0x32d/0x4e0
[] kobject_uevent_env+0x414/0x4e0
[] d_kill+0x3f/0x60
[] dput+0x1a/0xf0
[] device_del+0x1ac/0x2e0
[] usb_disable_device+0x78/0xf0
[] usb_disconnect+0x93/0xf0
[] hub_port_connect_change+0x2f2/0x3b0
[] hub_events+0x212/0x420
[] autoremove_wake_function+0x0/0x50
[] hub_thread+0x25/0x110
[] autoremove_wake_function+0x0/0x50
[] autoremove_wake_function+0x0/0x50
[] hub_thread+0x0/0x110
[] kthread+0x59/0xa0
[] kthread+0x0/0xa0
[] kernel_thread_helper+0x7/0x14
===
FIX kmalloc-1024: Restoring 0xc3758440-0xc3758440=0x6b

FIX kmalloc-1024: Marking all objects used
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] cpusets: do not allow TIF_MEMDIE tasks to allocate globally

2007-06-05 Thread Christoph Lameter
On Tue, 5 Jun 2007, David Rientjes wrote:

> mems_allowed.  Regardless, we should not allow allocations outside of the 
> cpuset because we have marked it TIF_MEMDIE and we don't have any explicit 
> guarantee that it is exiting yet and not mlock'ing an excessive amount of 
> memory that exceeds the capacity of all cpuset nodes.

H... But we have sent it a SIGKILL. If the process is following 
conventions then it is exiting. Of course the process could be abusing the 
system and attempting to OOM the whole system as an act of revenge for 
being killed but isnt this a bit far fetched?



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


libata - wrong IDE cable detection with dvd burner

2007-06-05 Thread Sean

Hi there,

Using 2.6.22-rc3 and finding it impossible to get the UDMA/66
connection advertised by a new dvd burner.   Substituting an old
hard drive in place of the DVD yields UDMA/100.  Yet with the
same cable, libata refuses to do more than /33 complaining that
it's a 40-wire cable.  I've already twiddled everything that
looks like it might be a factor in the bios.

This is an Intel D865GBF motherboard (dmesg below) with onboard
ide, however the exact same problem occurred when connecting this
dvd burner to the ide port on a promise sata pci board that also
lives in this system.

I found someone else having this problem here[1] but the patch
that worked in that situation appears to already be applied in
this kernel.  Not sure it makes a difference, but the kernel
is compiled without modules.  Any help would be appreciated; i'm
happy to do whatever is needed to diagnose the problem further.

Thanks,
Sean

[1] http://www.ussg.iu.edu/hypermail/linux/kernel/0703.3/0349.html

dmesg (ata1.00: limited to UDMA/33 due to 40-wire cable):

Linux version 2.6.22-rc3 ([EMAIL PROTECTED]) (gcc version 4.1.2 (Gentoo 4.1.2)) 
#6 SMP PREEMPT Sun Jun 3 14:45:47 EDT 2007
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e6000 - 0010 (reserved)
 BIOS-e820: 0010 - 7ff2fc00 (usable)
 BIOS-e820: 7ff2fc00 - 7ff3 (ACPI NVS)
 BIOS-e820: 7ff3 - 7ff4 (ACPI data)
 BIOS-e820: 7ff4 - 7fff (ACPI NVS)
 BIOS-e820: 7fff - 8000 (reserved)
 BIOS-e820: fecf - fecf1000 (reserved)
 BIOS-e820: fed2 - feda (reserved)
1151MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000ff780
Entering add_active_range(0, 0, 524079) 0 entries of 256 used
Zone PFN ranges:
  DMA 0 -> 4096
  Normal   4096 ->   229376
  HighMem229376 ->   524079
early_node_map[1] active PFN ranges
0:0 ->   524079
On node 0 totalpages: 524079
  DMA zone: 32 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 4064 pages, LIFO batch:0
  Normal zone: 1760 pages used for memmap
  Normal zone: 223520 pages, LIFO batch:31
  HighMem zone: 2302 pages used for memmap
  HighMem zone: 292401 pages, LIFO batch:31
DMI 2.3 present.
ACPI: RSDP 000F61B0, 0014 (r0 ACPIAM)
ACPI: RSDT 7FF3, 0038 (r1 INTEL  D865GBF  20040622 MSFT   97)
ACPI: FACP 7FF30200, 0081 (r2 INTEL  D865GBF  20040622 MSFT   97)
ACPI: DSDT 7FF30370, 4231 (r1 INTEL  D865GBF 1 MSFT  10D)
ACPI: FACS 7FF4, 0040
ACPI: APIC 7FF30300, 0068 (r1 INTEL  D865GBF  20040622 MSFT   97)
ACPI: ASF! 7FF345B0, 0099 (r16 LEGEND I865PASF1 MSFT  10D)
ACPI: TCPA 7FF34649, 0034 (r1 INTEL  TBLOEMID1 MSFT   97)
ACPI: WDDT 7FF3467D, 0040 (r1 INTEL  OEMWDDT 1 MSFT  10D)
ACPI: PM-Timer IO Port: 0x408
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:2 APIC version 20
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 8800 (gap: 8000:7ecf)
Built 1 zonelists.  Total pages: 519985
Kernel command line: root=/dev/sda3
mapped APIC to d000 (fee0)
mapped IOAPIC to c000 (fec0)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 16384 bytes)
Detected 2793.325 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 2071136k/2096316k available (3714k kernel code, 23980k reserved, 2314k 
data, 300k init, 1178812k highmem)
virtual kernel memory layout:
fixmap  : 0xfff4f000 - 0xf000   ( 704 kB)
pkmap   : 0xff80 - 0xffc0   (4096 kB)
vmalloc : 0xf880 - 0xff7fe000   ( 111 MB)
lowmem  : 0xc000 - 0xf800   ( 896 MB)
  .init : 0xc06ea000 - 0xc0735000   ( 300 kB)
  .data : 0xc04a092c - 0xc06e3294   (2314 kB)
  .text : 0xc010 - 0xc04a092c   (3714 kB)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 5588.51 BogoMIPS (lpj=2794258)
Security Framework 

Re: [patch] cpusets: do not allow TIF_MEMDIE tasks to allocate globally

2007-06-05 Thread David Rientjes
On Tue, 5 Jun 2007, Christoph Lameter wrote:

> On Tue, 5 Jun 2007, David Rientjes wrote:
> 
> > Obviously GFP_KERNEL allocations can allocate regardless of our memory 
> > exclusivity, but the point is that a job in one exclusive cpuset should 
> > not have the ability to effect the performance (in terms of reclaim and 
> > swap), memory usage, or survival of jobs in other exclusive cpusets 
> > because it was out of memory.
> 
> Right but the process is terminating thus only requiring limited resources
> to get finished. The process does not have the ability to affect the other 
> cpusets. I.e. it cannot directly allocate outside of the cpuset. The 
> system has that capability and the system is handling the termination of 
> the process and should terminate the process in a clean way if possible.
> 

And those limited resources should be available in the difference between 
low and no watermarks as defined by each zone in the cpuset's 
mems_allowed.  Regardless, we should not allow allocations outside of the 
cpuset because we have marked it TIF_MEMDIE and we don't have any explicit 
guarantee that it is exiting yet and not mlock'ing an excessive amount of 
memory that exceeds the capacity of all cpuset nodes.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Siddha, Suresh B
On Tue, Jun 05, 2007 at 04:57:07PM -0700, Darrick J. Wong wrote:
> On Tue, Jun 05, 2007 at 02:14:51PM -0700, Siddha, Suresh B wrote:
>  
> > Can you send us your system's dmesg aswell as output of /proc/interrupts?
> 
> http://sweaglesw.net/~djwong/docs/dmesg
> http://sweaglesw.net/~djwong/docs/interrupts

Didn't find anything wrong in that information. Can you try this
appended debug patch and see if you see this error msg in dmesg, when you
hit the bug? Thanks.

diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index d8bfe31..3409c1f 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -720,10 +720,13 @@ static int assign_irq_vector(int irq, cpumask_t mask)
 {
int err;
unsigned long flags;
+   int cpu = smp_processor_id();
 
spin_lock_irqsave(_lock, flags);
err = __assign_irq_vector(irq, mask);
spin_unlock_irqrestore(_lock, flags);
+   if (err && !cpu_isset(cpu, cpu_online_map))
+   printk("assigning irq to a vector failed : %d\n", err);
return err;
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] paravirt: helper to disable all IO space

2007-06-05 Thread Jeremy Fitzhardinge
Jeremy Fitzhardinge wrote:
> In a virtual environment, device drivers such as legacy IDE will waste
> quite a lot of time probing for their devices which will never appear.
> This helper function allows a paravirt implementation to lay claim to
> the whole iomem and ioport space, thereby disabling all device drivers
> trying to claim IO resources.
>
> Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
> Cc: Rusty Russell <[EMAIL PROTECTED]>
>
> diff -r 83c67f9402b5 arch/i386/kernel/paravirt.c
> --- a/arch/i386/kernel/paravirt.c Tue Jun 05 17:41:04 2007 -0700
> +++ b/arch/i386/kernel/paravirt.c Tue Jun 05 18:17:29 2007 -0700
> @@ -227,6 +227,39 @@ static int __init print_banner(void)
>   return 0;
>  }
>  core_initcall(print_banner);
> +
> +static struct resource reserve_ioports = {
> + .start = 0,
> + .end = IO_SPACE_LIMIT,
> + .name = "paravirt-ioport",
> + .flags = IORESOURCE_IO | IORESOURCE_BUSY,
> +};
> +
> +static struct resource reserve_iomem = {
> + .start = 0,
> + .end = -1,
> + .name = "paravirt-iomem",
> + .flags = IORESOURCE_MEM | IORESOURCE_BUSY,
> +};
> +
> +/*
> + * Reserve the whole legacy IO space to prevent any legacy drivers
> + * from wasting time probing for their hardware.  This is a fairly
> + * brute-force approach to disabling all non-virtual drivers.
> + * 
> + * Note that this must be called very early to have any effect.
> + */
> +int paravirt_disable_iospace(void)
> +{
> + int ret = 0;
> +
> + ret = request_resource(_resource, _ioports);
> + if (ret == 0)
> + ret = request_resource(_resource, _iomem);
> +
> + return ret;
> +}
> +
>  
>  struct paravirt_ops paravirt_ops = {
>   .name = "bare hardware",
> diff -r 83c67f9402b5 arch/i386/xen/setup.c
> --- a/arch/i386/xen/setup.c   Tue Jun 05 17:41:04 2007 -0700
> +++ b/arch/i386/xen/setup.c   Tue Jun 05 18:17:29 2007 -0700
>   

Oops.  This was supposed to be in the corresponding Xen patch.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] paravirt: helper to disable all IO space

2007-06-05 Thread Jeremy Fitzhardinge
In a virtual environment, device drivers such as legacy IDE will waste
quite a lot of time probing for their devices which will never appear.
This helper function allows a paravirt implementation to lay claim to
the whole iomem and ioport space, thereby disabling all device drivers
trying to claim IO resources.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>

diff -r 83c67f9402b5 arch/i386/kernel/paravirt.c
--- a/arch/i386/kernel/paravirt.c   Tue Jun 05 17:41:04 2007 -0700
+++ b/arch/i386/kernel/paravirt.c   Tue Jun 05 18:17:29 2007 -0700
@@ -227,6 +227,39 @@ static int __init print_banner(void)
return 0;
 }
 core_initcall(print_banner);
+
+static struct resource reserve_ioports = {
+   .start = 0,
+   .end = IO_SPACE_LIMIT,
+   .name = "paravirt-ioport",
+   .flags = IORESOURCE_IO | IORESOURCE_BUSY,
+};
+
+static struct resource reserve_iomem = {
+   .start = 0,
+   .end = -1,
+   .name = "paravirt-iomem",
+   .flags = IORESOURCE_MEM | IORESOURCE_BUSY,
+};
+
+/*
+ * Reserve the whole legacy IO space to prevent any legacy drivers
+ * from wasting time probing for their hardware.  This is a fairly
+ * brute-force approach to disabling all non-virtual drivers.
+ * 
+ * Note that this must be called very early to have any effect.
+ */
+int paravirt_disable_iospace(void)
+{
+   int ret = 0;
+
+   ret = request_resource(_resource, _ioports);
+   if (ret == 0)
+   ret = request_resource(_resource, _iomem);
+
+   return ret;
+}
+
 
 struct paravirt_ops paravirt_ops = {
.name = "bare hardware",
diff -r 83c67f9402b5 arch/i386/xen/setup.c
--- a/arch/i386/xen/setup.c Tue Jun 05 17:41:04 2007 -0700
+++ b/arch/i386/xen/setup.c Tue Jun 05 18:17:29 2007 -0700
@@ -8,12 +8,14 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
diff -r 83c67f9402b5 include/asm-i386/paravirt.h
--- a/include/asm-i386/paravirt.h   Tue Jun 05 17:41:04 2007 -0700
+++ b/include/asm-i386/paravirt.h   Tue Jun 05 18:17:29 2007 -0700
@@ -262,6 +262,7 @@ unsigned paravirt_patch_insns(void *site
 unsigned paravirt_patch_insns(void *site, unsigned len,
  const char *start, const char *end);
 
+int paravirt_disable_iospace(void);
 
 /*
  * This generates an indirect call based on the operation type number.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] xen: disable all non-virtual devices

2007-06-05 Thread Jeremy Fitzhardinge
A domU Xen environment has no non-virtual drivers, so make sure
they're all disabled at once.  This noticeably speeds up boot time.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>

diff -r c79da0042c7d arch/i386/xen/setup.c
--- a/arch/i386/xen/setup.c Tue Jun 05 18:17:30 2007 -0700
+++ b/arch/i386/xen/setup.c Tue Jun 05 18:17:59 2007 -0700
@@ -93,4 +93,6 @@ void __init xen_arch_setup(void)
/* fill cpus_possible with all available cpus */
xen_fill_possible_map();
 #endif
+
+   paravirt_disable_iospace();
 }


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/22] 2.6.22-rc3 perfmon2 : new system calls support

2007-06-05 Thread David Rientjes
On Tue, 5 Jun 2007, Stephane Eranian wrote:

> > > +static int pfm_task_incompatible(struct pfm_context *ctx, struct 
> > > task_struct *task)
> > > +{
> > > + /*
> > > +  * no kernel task or task not owned by caller
> > > +  */
> > > + if (!task->mm) {
> > > + PFM_DBG("cannot attach to kernel thread [%d]", task->pid);
> > > + return -EPERM;
> > > + }
> > 
> > This isn't a sufficient check for whether a task is owned by the caller.
> > 
> 
> The comment is missing a part. The ownership test is done by 
> ptrace_may_attahc().
> The above test is about checking for a kernel-only thread.
> 

Ok.

> > > +int pfm_get_task(struct pfm_context *ctx, pid_t pid, struct task_struct 
> > > **task)
> > > +{
> > 
> > This function could be marked static even though it's exported through 
> > perfmon.h in patch 13.  It is unreferenced elsewhere.
> > 
> No because it is used in another module on IA-64 (for compatibility with 
> older versions).
> 

Is this ia64 patch the one you mentioned that you did not post to LKML 
because it was too large in patch 0?  Is there any way you could break 
that patch up itself and post it for comments?

> > Why can't this be done with just struct task_struct *task as the third 
> > formal and change the assignment later to task = p?
> > 
> Because we need to carry the errno back: ESRCH or EPERM.
> 

Your formal is "struct task_struct **task" yet the only actual to this 
function is the memory address of a pointer to a single struct task_struct 
(i.e. it's never passed an array of struct task_struct pointers, which 
"struct task_struct **task" is).

And since you only ever use this has *task to get the pointer, you can 
change the formal to just be "struct task_struct *task" and then pass in a 
pointer to a single struct task_struct.

> > > + if (check_mask & PFM_CMD_STOPPED) {
> > > +
> > > + spin_unlock_irqrestore(>lock, local_flags);
> > > +
> > > + /*
> > > +  * check that the thread is ptraced AND STOPPED
> > > +  */
> > > + ret = ptrace_check_attach(task, 0);
> > > +
> > > + spin_lock_irqsave(>lock, new_flags);
> > > +
> > > + /*
> > > +  * flags may be different than when we released the lock
> > > +  */
> > > + *flags = new_flags;
> > 
> > You can't do this, you'll need to either separate these functions out by 
> > having pfm_check_task_state() indicate by a return value that 
> > ptrace_check_attach() should be checked or that we've already failed, or 
> > come up with a non-locking solution.
> > 
> Are you worried about the the local_flags vs. new flags?
> I think your solution would be cleaner, I will see what I can do.
> 

It should be simple if this is broken down into two smaller functions, the 
latter of which is called only upon a successful return of the former.

> > > +
> > > +asmlinkage long sys_pfm_write_pmcs(int fd, struct pfarg_pmc __user 
> > > *ureq, int count)
> > > +{
> > > + struct pfm_context *ctx;
> > > + struct file *filp;
> > > + struct pfarg_pmc pmcs[PFM_PMC_STK_ARG];
> > > + struct pfarg_pmc *req;
> > > + void *fptr;
> > > + unsigned long flags;
> > > + size_t sz;
> > > + int ret, fput_needed;
> > > +
> > 
> > Could this have a stack overflow on powerpc?
> > 
> The PFM_PMC_STK_ARG is per-arch,  so you could chose a very low value. 
> I think it is set to 4. pfarg_pmc s 48 bytes and pfarg_pmd is 176 bytes
> regardless of LP64 vs. ILP32.
> 

Stack overflows like that are annoying to track down and powerpc has the 
highest PFM_PMC_STK_ARG of the entire patchset.

> Thanks for your feedback.
> 

I'm looking forward to seeing the next patchset and I'll give it a 
thorough test run on x86_64.  It'd probably be best to base that patchset 
off 2.6.22 when it's released.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] cpusets: do not allow TIF_MEMDIE tasks to allocate globally

2007-06-05 Thread Christoph Lameter
On Tue, 5 Jun 2007, David Rientjes wrote:

> Obviously GFP_KERNEL allocations can allocate regardless of our memory 
> exclusivity, but the point is that a job in one exclusive cpuset should 
> not have the ability to effect the performance (in terms of reclaim and 
> swap), memory usage, or survival of jobs in other exclusive cpusets 
> because it was out of memory.

Right but the process is terminating thus only requiring limited resources
to get finished. The process does not have the ability to affect the other 
cpusets. I.e. it cannot directly allocate outside of the cpuset. The 
system has that capability and the system is handling the termination of 
the process and should terminate the process in a clean way if possible.

> > Processes stuck in D state is another issue with reliability.
> But it's a reality that we need to respect.  It happens and when it does
> it has the potential to hamper other cpusets that we setup to be exclusive 
> themselves.

The fact that process hang because of some software deficiency is an 
exceptional failure scenario. The system in general is not fully 
operational anymore anyways. Typically one or the other lock is held which 
makes certain kernel functionality inaccessible.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] containers: implement nsproxy containers subsystem

2007-06-05 Thread Serge E. Hallyn
Quoting Pavel Emelianov ([EMAIL PROTECTED]):
> Serge E. Hallyn wrote:
> > Quoting Pavel Emelianov ([EMAIL PROTECTED]):
> >> Serge E. Hallyn wrote:
> >>> >From 190ea72d213393dd1440643b2b87b5b2128dff87 Mon Sep 17 00:00:00 2001
> >>> From: Serge E. Hallyn <[EMAIL PROTECTED]>
> >>> Date: Mon, 4 Jun 2007 14:18:52 -0400
> >>> Subject: [PATCH 1/1] containers: implement nsproxy containers subsystem
> >>>
> >>> When a task enters a new namespace via a clone() or unshare(), a new
> >>> container is created and the task moves into it.  This enables
> >> I have a design question.
> >>
> >> How the child that has a new namespace guesses what id
> >> this namespace has in containers?
> > 
> > parse /proc/$$/container
> 
> Ok.
> 
> > So more likely the parent would have to grab the cloned pid of the
> > child, parse /proc/$$/container, then rename the container.
> 
> Child can happen to die before this and we'll have an orphaned
> container. I mean, it will be deletable, but its name will be unknown.
> 
> Maybe its better to get the containers id from the pid of new task?

Here is a patch to do so:

>From f42eeba62ec06544841070f55f6f1625c1216652 Mon Sep 17 00:00:00 2001
From: Serge E. Hallyn <[EMAIL PROTECTED]>
Date: Tue, 5 Jun 2007 10:25:05 -0400
Subject: [PATCH 1/1] containers: implement namespace tracking subsystem (v3)

When a task enters a new namespace via a clone() or unshare(), a
new container is created and the task moves into it.

This version names containers which are automatically created
using container_clone() as "node_" where pid is the pid of
the unsharing or cloned process.  (Thanks Pavel for the idea)
This is safe because if the process unshares again, it will
create
/containers/(...)/node_/node_

The only possibilities (AFAICT) for a -EEXIST on unshare are
1. pid wraparound
2. a process fails an unshare, then tries again.
Case 1 is unlikely enough that I ignore it (at least for now).
In case 2, the node_ will be empty and can be rmdir'ed to
make the subsequent unshare() succeed.

Changelog:
Name cloned containers as "node_".

(no idea where to start versioning, calling this v3 "at random")

Signed-off-by: Serge E. Hallyn <[EMAIL PROTECTED]>
---
 include/linux/container_subsys.h |6 ++
 include/linux/nsproxy.h  |7 +++
 init/Kconfig |9 
 kernel/Makefile  |1 +
 kernel/container.c   |   27 +++
 kernel/ns_container.c|   99 ++
 kernel/nsproxy.c |   16 ++
 7 files changed, 155 insertions(+), 10 deletions(-)
 create mode 100644 kernel/ns_container.c

diff --git a/include/linux/container_subsys.h b/include/linux/container_subsys.h
index 8fea7cf..9861751 100644
--- a/include/linux/container_subsys.h
+++ b/include/linux/container_subsys.h
@@ -24,3 +24,9 @@ SUBSYS(debug)
 #endif
 
 /* */
+
+#ifdef CONFIG_CONTAINER_NS
+SUBSYS(ns)
+#endif
+
+/* */
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index 189e0dc..8be975b 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -54,4 +54,11 @@ static inline void exit_task_namespaces(struct task_struct 
*p)
put_nsproxy(ns);
}
 }
+
+#ifdef CONFIG_CONTAINER_NS
+int ns_container_clone(struct task_struct *tsk);
+#else
+static inline int ns_container_clone(struct task_struct *tsk) { return 0; }
+#endif
+
 #endif
diff --git a/init/Kconfig b/init/Kconfig
index 5861ad9..d79c505 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -355,6 +355,15 @@ config CONTAINER_CPUACCT
  Provides a simple Resource Controller for monitoring the
  total CPU consumed by the tasks in a container
 
+config CONTAINER_NS
+bool "Namespace container subsystem"
+select CONTAINERS
+help
+  Provides a simple namespace container subsystem to
+  provide hierarchical naming of sets of namespaces,
+  for instance virtual servers and checkpoint/restart
+  jobs.
+
 config PROC_PID_CPUSET
bool "Include legacy /proc//cpuset file"
depends on CPUSETS
diff --git a/kernel/Makefile b/kernel/Makefile
index f73b3d3..34f2345 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -40,6 +40,7 @@ obj-$(CONFIG_CONTAINERS) += container.o
 obj-$(CONFIG_CONTAINER_DEBUG) += container_debug.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
 obj-$(CONFIG_CONTAINER_CPUACCT) += cpu_acct.o
+obj-$(CONFIG_CONTAINER_NS) += ns_container.o
 obj-$(CONFIG_IKCONFIG) += configs.o
 obj-$(CONFIG_STOP_MACHINE) += stop_machine.o
 obj-$(CONFIG_AUDIT) += audit.o auditfilter.o
diff --git a/kernel/container.c b/kernel/container.c
index 6f80487..988cd8b 100644
--- a/kernel/container.c
+++ b/kernel/container.c
@@ -2302,12 +2302,6 @@ void container_exit(struct task_struct *tsk, int 
run_callbacks)
put_css_group_taskexit(cg);
 }
 
-static atomic_t namecnt;
-static void get_unused_name(char *buf)
-{
-   sprintf(buf, "node%d", 

Re: [patch] cpusets: do not allow TIF_MEMDIE tasks to allocate globally

2007-06-05 Thread David Rientjes
On Tue, 5 Jun 2007, Christoph Lameter wrote:

> Exclusive is not as absolute as you may think. There is also the 
> GFP_KERNEL exception.
> 

Memory exclusivity with respect to cpusets should guarantee that memory 
nodes do not overlap with siblings if they are marked with mems_exclusive.  
The patch simply preserves that behavior through the time period between 
when the OOM killer issues a SIGKILL and the task is exiting and marked 
with PF_EXITING.

Obviously GFP_KERNEL allocations can allocate regardless of our memory 
exclusivity, but the point is that a job in one exclusive cpuset should 
not have the ability to effect the performance (in terms of reclaim and 
swap), memory usage, or survival of jobs in other exclusive cpusets 
because it was out of memory.

> Processes stuck in D state is another issue with reliability.
> 

But it's a reality that we need to respect.  It happens and when it does
it has the potential to hamper other cpusets that we setup to be exclusive 
themselves.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] UML - Fix kernel stack size on x86_64

2007-06-05 Thread Andrew Morton
On Tue, 5 Jun 2007 20:37:52 -0400 Jeff Dike <[EMAIL PROTECTED]> wrote:

> On Tue, Jun 05, 2007 at 05:00:01PM -0700, Andrew Morton wrote:
> > On Tue, 5 Jun 2007 16:50:55 -0400
> > Jeff Dike <[EMAIL PROTECTED]> wrote:
> > 
> > > [ This is 2.6.22 material ]
> > > 
> > > Having KERNEL_STACK_ORDER in defconfig overrides the value provided by
> > > Kconfig, breaking UML/x86_64, which wants 2 page stacks.
> 
> > That means the Kconfig rules are wrong, surely?
> 
> I'm far from a Kconfig expert,

Me either.  I learn enough for the problem at hand, then instaforget it
again.  Kinda like perl.

> but what I have is
> 
> config KERNEL_STACK_ORDER
>   int "Kernel stack size order"
>   default 1 if 64BIT
>   default 0 if !64BIT
> 
> which seems reasonably clear and simple...
> 

hm, OK, there's the problem.  This is an offered-to-the-user config option.

If you do

-   int "Kernel stack size order"
+   int

then this rule will no longer be offered to the user and `make oldconfig'
(actually anythingconfig) will override whatever happens to be in .config
for KERNEL_STACK_ORDER.

I'm not sure if that's actually what you want, but if the current situation
is that a random CONFIG_KERNEL_STACK_ORDER=0 left over in .config will
break the kernel at runtime then I think something sterner than editing
defconfig is needed?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] cpusets: do not allow TIF_MEMDIE tasks to allocate globally

2007-06-05 Thread Paul Jackson
> Sure, that behavior is unchanged.  We're relying on 
> nearest_exclusive_ancestor() to determine if such nodes overlap.

Ok ... my points on cpusets semantics having been heard,
I stand back down on the matter of memory semantics, where
I am not the master.

Thanks.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] cpusets: do not allow TIF_MEMDIE tasks to allocate globally

2007-06-05 Thread David Rientjes
On Tue, 5 Jun 2007, Paul Jackson wrote:

> Well, I can't speak to the 'real' meaning of TIF_MEMDIE with authority,
> but I can speak to the meaning of cpuset flags.
> 
> The mem_exclusive flag doesn't mean this.
> 
> It means that you cannot overlap the memory of a sibling cpuset.
> 

Which, with this patch, we will respect for tasks marked TIF_MEMDIE as 
well.

> You will, necessarily, still overlap the memory of your ancestor cpusets.
> 

And that's what nearest_exclusive_ancestor() determines later on if we're 
not requesting GFP_HARDWALL.

> Whether or not you make any use of the mem_exclusive flag, you still
> get the same (limited) guarantees of memory usage -- namely that your
> memory won't be used by tasks in non-overlapping cpusets, with some
> exceptions, such as:
>  1) memory handed out to interrupt code,
>  2) memory handed out for GFP_ATOMIC requests, and
>  3) tasks marked PF_EXITING -- will soon free up memory
> 

This is precisely the point: we already respect PF_EXITING tasks with 
their ability to allocate outside their own cpuset.  That gets set in 
do_exit() when a task is in receipt of the SIGKILL from the OOM killer 
during the exit path.  Between these time periods (the time when we issue 
the OOM SIGKILL and the time we're marked PF_EXITING in do_exit()), we 
should not allow allocations outside of our cpuset because we do not yet 
have the guarantee that they will exit synchronously or reliably.

> Tasks in cpusets ancestor to your tasks cpuset can always, easily,
> use memory on the same nodes your task is on.
> 

Sure, that behavior is unchanged.  We're relying on 
nearest_exclusive_ancestor() to determine if such nodes overlap.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] lguest tsc fix

2007-06-05 Thread Rusty Russell
On Tue, 2007-06-05 at 20:15 +0200, Andi Kleen wrote:
> On Wed, Jun 06, 2007 at 12:56:36AM +1000, Rusty Russell wrote:
> > In recent -mm kernels, the TSC capability cannot be disabled,
> > resulting in a divide by zero error in the normal sched_clock.
> 
> That will hopefully change. I hope hpa will just undo this.
> 
> > 
> > The correct fix is to have a special lguest sched_clock
> > implementation: this is as simple as it gets.
> 
> But gettimeofday might still use it. Is that ok for you?

Yes, I don't think it will be any worse than before.  Basically the
guest uses a dumb jiffies-based clock.  The TSC patch later in this same
series changes it back to use the native sched_clock, and overrides
tsc_khz instead (based on information from the host).

Thanks,
Rusty.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/6] lguest suppress IDE probing

2007-06-05 Thread Rusty Russell
On Tue, 2007-06-05 at 17:07 +0100, Alan Cox wrote:
> On Wed, 06 Jun 2007 00:58:03 +1000
> Rusty Russell <[EMAIL PROTECTED]> wrote:
> 
> > The IDE probe is the slowest part of boot: by suppressing it we cut
> > boot from from 3 seconds to half a second.
> 
> NAK NAK NAK NAK NAK

Hi Alan!

> > AFAICT, the commandline is the easiest way to suppress the probing.
> 
> Gaa ... Rusty surely you have more taste than that.

Indeed, but it got attention 8)

> See include/asm-foo/ide.h
> 
> Add an lguest check to go with the pci check and for the lguest case just
> say "no controllers"

Actually, Jeremy suggested claiming the entire IO space.  That works for
Xen domU too, and makes some amount of sense.

> Better yet just don't compile in the old IDE stuff, lguest doesn't have a
> PCI or ISA bus anyway.

Sure, but the "run the same kernel as guest and host" is a really nice
feature.

> Alternatively make the IDE I/O space return 0xFF and it'll skip them
> anyway.

Hmm, every "in" should be returning 0xFFs, but I still get the delay and
the probing.  Xen domU gets it too.

Thanks!
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.22-rc4: (mtrr) WARNING: ... Section mismatch

2007-06-05 Thread Sebastian Kemper
Hello all,

I get these warnings compiling rc4 (see attached out.log file). I'm used 
to warnings compiling my (vanilla) kernels, but these look more 
suspicious than usual to me.

If this is nothing to worry about, please disregard. Otherwise please 
CC.

Regards
Sebastian

-- 
"When the going gets weird, the weird turn pro."

Hunter S. Thompson
makeccache bzImage modules modules_install
  CHK include/linux/version.h
  CHK include/linux/utsrelease.h
  CALLscripts/checksyscalls.sh
  CHK include/linux/compile.h
  MODPOST vmlinux
WARNING: arch/i386/kernel/built-in.o(.text+0x9307): Section mismatch: reference to .init.text:amd_init_mtrr (between 'mtrr_bp_init' and 'mtrr_attrib_to_str')
WARNING: arch/i386/kernel/built-in.o(.text+0x930c): Section mismatch: reference to .init.text:cyrix_init_mtrr (between 'mtrr_bp_init' and 'mtrr_attrib_to_str')
WARNING: arch/i386/kernel/built-in.o(.text+0x9311): Section mismatch: reference to .init.text:centaur_init_mtrr (between 'mtrr_bp_init' and 'mtrr_attrib_to_str')
WARNING: arch/i386/kernel/built-in.o(.text+0xa3b4): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'generic_get_mtrr')
WARNING: arch/i386/kernel/built-in.o(.text+0xa3cb): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'generic_get_mtrr')
WARNING: arch/i386/kernel/built-in.o(.text+0xa3f3): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'generic_get_mtrr')
Kernel: arch/i386/boot/bzImage is ready  (#1)
  Building modules, stage 2.
  MODPOST 35 modules
  INSTALL drivers/media/common/saa7146.ko
  INSTALL drivers/media/common/saa7146_vv.ko
  INSTALL drivers/media/dvb/dvb-core/dvb-core.ko
  INSTALL drivers/media/dvb/frontends/stv0299.ko
  INSTALL drivers/media/dvb/ttpci/dvb-ttpci.ko
  INSTALL drivers/media/dvb/ttpci/ttpci-eeprom.ko
  INSTALL drivers/media/video/compat_ioctl32.ko
  INSTALL drivers/media/video/v4l1-compat.ko
  INSTALL drivers/media/video/v4l2-common.ko
  INSTALL drivers/media/video/video-buf.ko
  INSTALL drivers/media/video/videodev.ko
  INSTALL drivers/scsi/scsi_mod.ko
  INSTALL drivers/scsi/scsi_wait_scan.ko
  INSTALL drivers/scsi/sd_mod.ko
  INSTALL drivers/usb/storage/usb-storage.ko
  INSTALL fs/fat/fat.ko
  INSTALL fs/isofs/isofs.ko
  INSTALL fs/nls/nls_base.ko
  INSTALL fs/nls/nls_cp850.ko
  INSTALL fs/nls/nls_iso8859-15.ko
  INSTALL fs/udf/udf.ko
  INSTALL fs/vfat/vfat.ko
  INSTALL net/ipv4/netfilter/ip_tables.ko
  INSTALL net/ipv4/netfilter/ipt_LOG.ko
  INSTALL net/ipv4/netfilter/ipt_REJECT.ko
  INSTALL net/ipv4/netfilter/iptable_filter.ko
  INSTALL net/ipv4/netfilter/iptable_mangle.ko
  INSTALL net/ipv4/netfilter/nf_conntrack_ipv4.ko
  INSTALL net/netfilter/nf_conntrack.ko
  INSTALL net/netfilter/x_tables.ko
  INSTALL net/netfilter/xt_conntrack.ko
  INSTALL net/netfilter/xt_multiport.ko
  INSTALL net/netfilter/xt_pkttype.ko
  INSTALL net/netfilter/xt_state.ko
  INSTALL net/netfilter/xt_tcpudp.ko
if [ -r System.map -a -x /sbin/depmod ]; then /sbin/depmod -ae -F System.map  2.6.22-rc4; fi
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.
 
Linux section_eight 2.6.21.3 #2 Mon Jun 4 14:19:37 CEST 2007 i686 AMD 
Sempron(tm)   2400+ AuthenticAMD GNU/Linux
 
Gnu C  4.1.2
Gnu make   3.81
binutils   2.16.1
util-linux 2.12r
mount  2.12r
module-init-tools  3.2.2
e2fsprogs  1.39
Linux C Library> libc.2.5
Dynamic linker (ldd)   2.5
Procps 3.2.7
Net-tools  1.60
Kbd1.12
Sh-utils   6.7
udev   104
Modules Loaded ipt_LOG xt_tcpudp xt_state xt_pkttype ipt_REJECT 
xt_multiport xt_conntrack iptable_mangle nf_conntrack_ipv4 nf_conntrack 
iptable_filter ip_tables x_tables af_packet ndiswrapper nls_iso8859_15 
nls_cp850 vfat fat nls_base sd_mod lirc_serial lirc_dev stv0299 dvb_ttpci 
dvb_core saa7146_vv video_buf saa7146 videodev v4l2_common v4l1_compat 
ttpci_eeprom usb_storage scsi_mod
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.22-rc4
# Wed Jun  6 02:47:34 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
# CONFIG_EXPERIMENTAL is not set
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_SYSVIPC_SYSCTL=y
# 

signalfd API issues (was Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes)

2007-06-05 Thread Nicholas Miell
On Tue, 2007-06-05 at 17:37 -0700, Davide Libenzi wrote:
> On Tue, 5 Jun 2007, Nicholas Miell wrote:
> 
> > On Tue, 2007-06-05 at 17:11 -0700, Davide Libenzi wrote:
> > > On Tue, 5 Jun 2007, Nicholas Miell wrote:
> > > 
> > > > Yes, that's certainly wrong, but that's an implementation issue. I was
> > > > more concerned about the design of the API.
> > > > 
> > > > Naively, I would expect a reads on a signalfd to return either process
> > > > signals or thread signals targeted towards the thread doing the read.
> > > > 
> > > > What it actually does (delivering process signals or thread signals
> > > > targeted towards the thread that created the signalfd) is weird.
> > > > 
> > > > For one, it means you can't create a single signalfd, stick it in an
> > > > epoll set, and then wait on that set from multiple threads.
> > > 
> > > In your box threads do share the sighand, don't they? :)
> > > 
> > 
> > I have no idea what you're trying to say, but it doesn't appear to
> > address the issue I raise.
> 
> "For one, it means you can't create a single signalfd, stick it in an
>  epoll set, and then wait on that set from multiple threads."
> 
> Why not?
> A signalfd, like I said, is attached to the sighand, that is shared by the 
> threads.
> 
> 

POSIX requires the following:

"At the time of generation, a determination shall be made whether the
signal has been generated for the process or for a specific thread
within the process. Signals which are generated by some action
attributable to a particular thread, such as a hardware fault, shall be
generated for the thread that caused the signal to be generated. Signals
that are generated in association with a process ID or process group ID
or an asynchronous event, such as terminal activity, shall be generated
for the process."

In practice, this means that signals like SIGSEGV/SIGFPE/SIGILL/etc. and
signals generated by pthread_kill() (i.e. tkill() or tgkill()) are
directed to a specific threads, while other signals are directed to the
process as a whole and serviced by any thread that isn't blocking that
specific signal.

Linux accomplishes this by having two lists of pending signals --
current->pending is the per-thread list and
current->signal->shared_pending is the process-wide list.

dequeue_signal(tsk, ...) looks for signals first in tsk->pending and
then in tsk->signal->shared_pending.

sys_signalfd() stores current in signalfd_ctx. signalfd_read() passes
that context to signalfd_dequeue, which passes that that saved
task_struct pointer to dequeue_signal.

This means that a signalfd will deliver signals targeted towards either
the original thread that created that signalfd, or signals targeted
towards the process as a whole.

This means that a single signalfd is not adequate to handle signal
delivery for all threads in a process, because signals targeted towards
threads other than the thread that originally created the signalfd will
never be queued to that signalfd.

Is my analysis wrong?

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AGPGART] intel_agp: Add support for G33, Q33 and Q35 chipsets

2007-06-05 Thread Wang Zhenyu

Dave,

This patch adds pci ids for G33, Q33 and Q35 chips, and update with new
GTT size and stolen mem size detect method on these chips.

It bases on below intel-agp patches currently in -mm tree:
intel_agp-cleanup-intel-private-data.patch
intel_agp-cleanup-intel-private-data-update.patch
intel_agp-use-table-for-device-probe.patch
intel_agp-use-table-for-device-probe-update.patch
intel_agp-add-support-for-965gme-gle.patch
intel_agp-add-support-for-945gme.patch

Signed-off-by: Wang Zhenyu <[EMAIL PROTECTED]>
---
 drivers/char/agp/agp.h   |6 +++-
 drivers/char/agp/intel-agp.c |   78 -
 2 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/drivers/char/agp/agp.h b/drivers/char/agp/agp.h
index fdbca25..35ab1a9 100644
--- a/drivers/char/agp/agp.h
+++ b/drivers/char/agp/agp.h
@@ -176,7 +176,7 @@ struct agp_bridge_data {
 #define I830_GMCH_MEM_MASK 0x1
 #define I830_GMCH_MEM_64M  0x1
 #define I830_GMCH_MEM_128M 0
-#define I830_GMCH_GMS_MASK 0x70
+#define I830_GMCH_GMS_MASK 0xF0
 #define I830_GMCH_GMS_DISABLED 0x00
 #define I830_GMCH_GMS_LOCAL0x10
 #define I830_GMCH_GMS_STOLEN_512   0x20
@@ -231,6 +231,10 @@ struct agp_bridge_data {
 #define I965_PGETBL_SIZE_512KB (0 << 1)
 #define I965_PGETBL_SIZE_256KB (1 << 1)
 #define I965_PGETBL_SIZE_128KB (2 << 1)
+#define G33_PGETBL_SIZE_MASK(3 << 8)
+#define G33_PGETBL_SIZE_1M  (1 << 8)
+#define G33_PGETBL_SIZE_2M  (2 << 8)
+
 #define I810_DRAM_CTL  0x3000
 #define I810_DRAM_ROW_00x0001
 #define I810_DRAM_ROW_0_SDRAM  0x0001
diff --git a/drivers/char/agp/intel-agp.c b/drivers/char/agp/intel-agp.c
index 3c4a1c2..1c5ee4f 100644
--- a/drivers/char/agp/intel-agp.c
+++ b/drivers/char/agp/intel-agp.c
@@ -22,6 +22,12 @@
 #define PCI_DEVICE_ID_INTEL_82965GM_IG  0x2A02
 #define PCI_DEVICE_ID_INTEL_82965GME_IG 0x2A12
 #define PCI_DEVICE_ID_INTEL_82945GME_IG 0x27AE
+#define PCI_DEVICE_ID_INTEL_G33_HB  0x29C0
+#define PCI_DEVICE_ID_INTEL_G33_IG  0x29C2
+#define PCI_DEVICE_ID_INTEL_Q35_HB  0x29B0
+#define PCI_DEVICE_ID_INTEL_Q35_IG  0x29B2
+#define PCI_DEVICE_ID_INTEL_Q33_HB  0x29D0
+#define PCI_DEVICE_ID_INTEL_Q33_IG  0x29D2
 
 #define IS_I965 (agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82946GZ_HB || \
  agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82965G_1_HB || 
\
@@ -29,6 +35,9 @@
  agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82965G_HB || \
  agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82965GM_HB)
 
+#define IS_G33 (agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_G33_HB || \
+   agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_Q35_HB || \
+   agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_Q33_HB)
 
 extern int agp_memory_reserved;
 
@@ -55,6 +64,8 @@ extern int agp_memory_reserved;
 #define I915_PTEADDR   0x1C
 #define I915_GMCH_GMS_STOLEN_48M   (0x6 << 4)
 #define I915_GMCH_GMS_STOLEN_64M   (0x7 << 4)
+#define G33_GMCH_GMS_STOLEN_128M   (0x8 << 4)
+#define G33_GMCH_GMS_STOLEN_256M   (0x9 << 4)
 
 /* Intel 965G registers */
 #define I965_MSAC 0x62
@@ -448,6 +459,22 @@ static void intel_i830_init_gtt_entries(void)
size = 512;
}
size += 4; /* add in BIOS popup space */
+   } else if (IS_G33) {
+   /* G33's GTT size defined in gmch_ctrl */
+   switch (gmch_ctrl & G33_PGETBL_SIZE_MASK) {
+   case G33_PGETBL_SIZE_1M:
+   size = 1024;
+   break;
+   case G33_PGETBL_SIZE_2M:
+   size = 2048;
+   break;
+   default:
+   printk(KERN_INFO PFX "Unknown page table size 0x%x, "
+   "assuming 512KB\n",
+   (gmch_ctrl & G33_PGETBL_SIZE_MASK));
+   size = 512;
+   }
+   size += 4;
} else {
/* On previous hardware, the GTT size was just what was
 * required to map the aperture.
@@ -499,7 +526,8 @@ static void intel_i830_init_gtt_entries(void)
if (agp_bridge->dev->device == 
PCI_DEVICE_ID_INTEL_82915G_HB ||
agp_bridge->dev->device == 
PCI_DEVICE_ID_INTEL_82915GM_HB ||
agp_bridge->dev->device == 
PCI_DEVICE_ID_INTEL_82945G_HB ||
-   agp_bridge->dev->device == 
PCI_DEVICE_ID_INTEL_82945GM_HB || IS_I965 )
+   agp_bridge->dev->device == 
PCI_DEVICE_ID_INTEL_82945GM_HB ||
+   IS_I965 || IS_G33)
gtt_entries = MB(48) - KB(size);
else
gtt_entries = 0;
@@ -509,10 +537,23 @@ static void 

Re: [PATCH 4/6] lguest don't signal like crazy, use LHREQ_BREAK command

2007-06-05 Thread Matt Mackall
On Wed, Jun 06, 2007 at 10:07:46AM +1000, Rusty Russell wrote:
> On Tue, 2007-06-05 at 10:34 -0500, Matt Mackall wrote:
> > On Wed, Jun 06, 2007 at 01:00:06AM +1000, Rusty Russell wrote:
> > > We currently use a "waker" process: a child of the launcher which
> > > selects() on the incoming file descriptors.  It sends a SIGUSR1 to the
> > > launcher whenever select() returns to kick the launcher out of the
> > > kernel.
> > 
> > If I break out of lguest with three ctrl-Cs, this leaves one of the
> > lguest processes running with /dev/lguest held open.
> 
> This patch, or the previous version I sent?  The previous one had this
> issue, so this one takes some care to kill the waker and I haven't seen
> it since:
> 
>   /* Make sure waker is not blocked in BREAK */
>   u32 args[] = { LHREQ_BREAK, 0 };
>   close(waker_fd);
>   write(fd, args, sizeof(args));
>   exit(2);

Probably the one you sent earlier.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2/2] 2.6.22-rc4: known regressions with patches

2007-06-05 Thread Paul Mundt
On Tue, Jun 05, 2007 at 04:54:35PM +0200, Michal Piotrowski wrote:
> SATA/PATA
> 
> Subject: libata reset-seq merge broke sata_sil on sh
> References : http://lkml.org/lkml/2007/5/10/63
> Submitter  : Paul Mundt <[EMAIL PROTECTED]>
> Handled-By : Tejun Heo <[EMAIL PROTECTED]>
> Caused-By  : commit 4750def52cb2c21732dda9aa1d43a07db37b0186
> Patch  : http://lkml.org/lkml/2007/5/19/161
> Status : patch available
> 
Fixed by fd7fe701612e42fb8780d7bf61fbb0467a488c9b, which was already in
-rc3.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] mm: variable length argument support

2007-06-05 Thread Ollie Wild

OK.  It sounds like a healthy dose of comments is in order.  I'll
clean things up and send out a new patch sometime tonight or tomorrow.

Additional comments inline below:


> - len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES);
> - if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
> + len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
> + if (!len || len > MAX_ARG_STRLEN)

strnlen_user() is a scary function.  Please do remember that if the memory
we just strlen'ed is writeable by any user thread then that thread can at
any time invalidate the number which the kernel now holds.


At this point, we've already called setup_arg_pages(), so the user
memory is our own private copy.  No other threads can access it.


> - !(len = strnlen_user(compat_ptr(str), bprm->p))) {
> + !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) {
>   ret = -EFAULT;
>   goto out;
>   }
>
> - if (bprm->p < len)  {
> + if (MAX_ARG_STRLEN < len) {
>   ret = -E2BIG;
>   goto out;
>   }

Do we have an off-by-one here?  Should it be <=?


No, strnlen_user() returns N+1 (where N==MAX_ARG_STRLEN) if the string
is too large.


If not, then this code is relying upon the string's terminating \0 coming
from userspace?  If so, that's buggy: userspace can overwrite the \0 after
we ran the strnlen_user(), perhaps, and confound the kernel?


If that's the case, then we will fail to copy the null terminator, and
the string will munge into the following string.  Since we always
access this data via the various userspace access routines, we will
either return an error on a later operation, or the new process will
segfault shortly upon starting.


> + vma_adjust(vma, new_start, old_end,
> +vma->vm_pgoff - (-shift >> PAGE_SHIFT), NULL);

hm, a right-shift of a negated unsigned value.  That's pretty unusual.  I
hope you know what you're doing ;)


This is correct.  In this case, shift is already populated with a
negative, wrapped unsigned value.  The -shift is needed to make it
positive before the bitwise shift.


>  #define EXTRA_STACK_VM_PAGES 20  /* random */
>
> +/* Finalizes the stack vm_area_struct.  The flags and permissions are 
updated,
> + * the stack is optionally relocated, and some extra space is added.
> + */

That's better.

But what extra space is added, and why?


We add EXTRA_STACK_VM_PAGES.  To be honest, I think neither of us know
why this is done.  It's just what the old code did, so we preserved
it.

Ollie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] UML - Fix kernel stack size on x86_64

2007-06-05 Thread Jeff Dike
On Tue, Jun 05, 2007 at 05:00:01PM -0700, Andrew Morton wrote:
> On Tue, 5 Jun 2007 16:50:55 -0400
> Jeff Dike <[EMAIL PROTECTED]> wrote:
> 
> > [ This is 2.6.22 material ]
> > 
> > Having KERNEL_STACK_ORDER in defconfig overrides the value provided by
> > Kconfig, breaking UML/x86_64, which wants 2 page stacks.

> That means the Kconfig rules are wrong, surely?

I'm far from a Kconfig expert, but what I have is

config KERNEL_STACK_ORDER
int "Kernel stack size order"
default 1 if 64BIT
default 0 if !64BIT

which seems reasonably clear and simple...

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rt2..8 troubles

2007-06-05 Thread Rui Nuno Capela
Rui Nuno Capela wrote:
> Thomas Gleixner wrote:
>> On Fri, 2007-05-25 at 21:58 +0100, Rui Nuno Capela wrote:
>>> Is there anything I can do better to help myself figuring out this
>>> issue? As this is a  modern laptop such things like a serial console are
>>> unavailable, but it would be nice to track things up over netconsole
>>> perhaps?
>>>
>>> I just need some bright and nice directions now ;) Hope someone finds
>>> this worth of attention too. Meanwhile, I'll be happy with 2.6.21-rt1 :)
>> Can you boot with "hpet=disable" on the command line ?
>>
> 
> Nope. It doesn't seem to have significant effect. Same time-bomb
> behavior: after an indeterminate period of uptime, the systems stops
> responding and cannot spawn new processes (current running ones still
> live on, strange).
> 
>> If that does not help, please provide the output of /proc/timer_list.
>>
> 
> This is with my latest iteration:
>   http://www.rncbc.org/datahub/config-2.6.21.1-rt8.0
> 
> Normal boot on which it behaves as badly as reported:
>   http://www.rncbc.org/datahub/dmesg-2.6.21.1-rt8.0
> 
> # cat /proc/timer_list
> Timer List Version: v0.3
> HRTIMER_MAX_CLOCK_BASES: 2
> now at 131736771907 nsecs
> 
> cpu: 0
>  clock 0:
>   .index:  0
>   .resolution: 1 nsecs
>   .get_time:   ktime_get_real
>   .offset: 1180213690448299114 nsecs
> active timers:
>  clock 1:
>   .index:  1
>   .resolution: 1 nsecs
>   .get_time:   ktime_get
>   .offset: 0 nsecs
> active timers:
>  #0: , tick_sched_timer, S:01
>  # expires at 13173700 nsecs [in 228093 nsecs]
>  #1: , it_real_fn, S:01
>  # expires at 131751277843 nsecs [in 14505936 nsecs]
>  #2: , hrtimer_wakeup, S:01
>  # expires at 131802703679 nsecs [in 65931772 nsecs]
>  #3: , hrtimer_wakeup, S:01
>  # expires at 131802705006 nsecs [in 65933099 nsecs]
>  #4: , hrtimer_wakeup, S:01
>  # expires at 132412838830 nsecs [in 676066923 nsecs]
>  #5: , it_real_fn, S:01
>  # expires at 137026607454 nsecs [in 5289835547 nsecs]
>  #6: , hrtimer_wakeup, S:01
>  # expires at 141381493725 nsecs [in 9644721818 nsecs]
>  #7: , hrtimer_wakeup, S:01
>  # expires at 170796028701 nsecs [in 39059256794 nsecs]
>   .expires_next   : 13173700 nsecs
>   .hres_active: 1
>   .nr_events  : 40634
>   .nohz_mode  : 2
>   .idle_tick  : 13172400 nsecs
>   .tick_stopped   : 0
>   .idle_jiffies   : 4294799020
>   .idle_calls : 178848
>   .idle_sleeps: 133212
>   .idle_entrytime : 131736069830 nsecs
>   .idle_sleeptime : 100895567465 nsecs
>   .last_jiffies   : 4294799033
>   .next_jiffies   : 4294799039
>   .idle_expires   : 13173600 nsecs
> jiffies: 4294799033
> 
> cpu: 1
>  clock 0:
>   .index:  0
>   .resolution: 1 nsecs
>   .get_time:   ktime_get_real
>   .offset: 1180213690448299114 nsecs
> active timers:
>  clock 1:
>   .index:  1
>   .resolution: 1 nsecs
>   .get_time:   ktime_get
>   .offset: 0 nsecs
> active timers:
>  #0: , hrtimer_wakeup, S:01
>  # expires at 131737067173 nsecs [in 295266 nsecs]
>  #1: , tick_sched_timer, S:01
>  # expires at 13173725 nsecs [in 478093 nsecs]
>  #2: , hrtimer_wakeup, S:01
>  # expires at 139151071745 nsecs [in 7414299838 nsecs]
>  #3: , hrtimer_wakeup, S:01
>  # expires at 139151133755 nsecs [in 7414361848 nsecs]
>  #4: , hrtimer_wakeup, S:01
>  # expires at 139151154005 nsecs [in 7414382098 nsecs]
>   .expires_next   : 131737067173 nsecs
>   .hres_active: 1
>   .nr_events  : 31510
>   .nohz_mode  : 2
>   .idle_tick  : 13173425 nsecs
>   .tick_stopped   : 0
>   .idle_jiffies   : 4294799030
>   .idle_calls : 151213
>   .idle_sleeps: 107018
>   .idle_entrytime : 131735193036 nsecs
>   .idle_sleeptime : 108256832194 nsecs
>   .last_jiffies   : 4294799032
>   .next_jiffies   : 4294799040
>   .idle_expires   : 13174300 nsecs
> jiffies: 4294799033
> 
> 
> Tick Device: mode: 1
> Clock Event Device: hpet
>  max_delta_ns:   2147483647
>  min_delta_ns:   3352
>  mult:   61496110
>  shift:  32
>  mode:   3
>  next_event: 13173700 nsecs
>  set_next_event: hpet_legacy_next_event
>  set_mode:   hpet_legacy_set_mode
>  event_handler:  tick_handle_oneshot_broadcast
> tick_broadcast_mask: 0003
> tick_broadcast_oneshot_mask: 0001
> 
> 
> Tick Device: mode: 1
> Clock Event Device: lapic
>  max_delta_ns:   806914928
>  min_delta_ns:   1442
>  mult:   44650051
>  shift:  32
>  mode:   1
>  next_event: 13173700 nsecs
>  set_next_event: lapic_next_event
>  set_mode:   lapic_timer_setup
>  event_handler:  hrtimer_interrupt
> 
> Tick Device: mode: 1
> Clock Event Device: lapic
>  max_delta_ns:   806914928
>  min_delta_ns:   1442
>  mult:   44650051
>  shift:  32
>  mode:   3
>  next_event: 131737067173 nsecs
>  set_next_event: lapic_next_event
>  set_mode:   lapic_timer_setup
>  event_handler:  hrtimer_interrupt
> --
> 
> 
> Alternate boot with hpet=disabled as suggested, but no 

Re: [PATCH 4/6] lguest don't signal like crazy, use LHREQ_BREAK command

2007-06-05 Thread Rusty Russell
On Tue, 2007-06-05 at 10:34 -0500, Matt Mackall wrote:
> On Wed, Jun 06, 2007 at 01:00:06AM +1000, Rusty Russell wrote:
> > We currently use a "waker" process: a child of the launcher which
> > selects() on the incoming file descriptors.  It sends a SIGUSR1 to the
> > launcher whenever select() returns to kick the launcher out of the
> > kernel.
> 
> If I break out of lguest with three ctrl-Cs, this leaves one of the
> lguest processes running with /dev/lguest held open.

This patch, or the previous version I sent?  The previous one had this
issue, so this one takes some care to kill the waker and I haven't seen
it since:

/* Make sure waker is not blocked in BREAK */
u32 args[] = { LHREQ_BREAK, 0 };
close(waker_fd);
write(fd, args, sizeof(args));
exit(2);

Thanks,
Rusty.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes

2007-06-05 Thread Davide Libenzi
On Tue, 5 Jun 2007, Nicholas Miell wrote:

> On Tue, 2007-06-05 at 17:11 -0700, Davide Libenzi wrote:
> > On Tue, 5 Jun 2007, Nicholas Miell wrote:
> > 
> > > Yes, that's certainly wrong, but that's an implementation issue. I was
> > > more concerned about the design of the API.
> > > 
> > > Naively, I would expect a reads on a signalfd to return either process
> > > signals or thread signals targeted towards the thread doing the read.
> > > 
> > > What it actually does (delivering process signals or thread signals
> > > targeted towards the thread that created the signalfd) is weird.
> > > 
> > > For one, it means you can't create a single signalfd, stick it in an
> > > epoll set, and then wait on that set from multiple threads.
> > 
> > In your box threads do share the sighand, don't they? :)
> > 
> 
> I have no idea what you're trying to say, but it doesn't appear to
> address the issue I raise.

"For one, it means you can't create a single signalfd, stick it in an
 epoll set, and then wait on that set from multiple threads."

Why not?
A signalfd, like I said, is attached to the sighand, that is shared by the 
threads.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix race in AF_UNIX

2007-06-05 Thread David Miller
From: David Miller <[EMAIL PROTECTED]>
Date: Tue, 05 Jun 2007 00:02:47 -0700 (PDT)

> From: Miklos Szeredi <[EMAIL PROTECTED]>
> Date: Mon, 04 Jun 2007 11:45:32 +0200
> 
> > > A recv() on an AF_UNIX, SOCK_STREAM socket can race with a
> > > send()+close() on the peer, causing recv() to return zero, even though
> > > the sent data should be received.
> > > 
> > > This happens if the send() and the close() is performed between
> > > skb_dequeue() and checking sk->sk_shutdown in unix_stream_recvmsg():
> > > 
> > > process A  skb_dequeue() returns NULL, there's no data in the socket queue
> > > process B  new data is inserted onto the queue by unix_stream_sendmsg()
> > > process B  sk->sk_shutdown is set to SHUTDOWN_MASK by unix_release_sock()
> > > process A  sk->sk_shutdown is checked, unix_release_sock() returns zero
> > 
> > This is only part of the story.  It turns out, there are other races
> > involving the garbage collector, that can throw away perfectly good
> > packets with AF_UNIX sockets in them.
> > 
> > The problems arise when a socket goes from installed to in-flight or
> > vica versa during garbage collection.  Since gc is done with a
> > spinlock held, this only shows up on SMP.
> > 
> > The following patch fixes it for me, but it's possibly the wrong
> > approach.
> > 
> > Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>

Concerning this specific patch I think we need to rethink it
a bit.

Holding a global mutex over recvmsg() calls under AF_UNIX is pretty
much a non-starter, this will kill performance for multi-threaded
apps.

One possible solution is for the garbage collection code to hold the
u->readlock while processing a socket, but be careful about deadlocks.

Anyone want to give that a try?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata & no PCI: dma_[un]map_single undefined

2007-06-05 Thread Valdis . Kletnieks
On Tue, 05 Jun 2007 18:16:25 EDT, Jeff Garzik said:
> On Tue, Jun 05, 2007 at 11:03:45PM +0100, Russell King wrote:
> > And rather than configuring your MUA to ignore the header...
> 
> > You're using mutt, mutt can be configured so.
> 
> So, you are seriously proposing that EVERYONE reconfigure their MUA,
> because you are sending out bad headers?
> 
> Don't you think that is an unscalable solution, and an imposition?

Not only is it unscalable, it almost by definition is anti-social.  The only
times I've actually seen one in the wild, it's because some subscriber to a
mailing list wishes to subvert the list's culture in a manner worse than
a Reply-To: header.  I considered adding support for Mail-Followup-To: to
the exmh MUA, but decided against it, because it would basically mean that
every time I got one, I'd have to curse and moan and put the To: and cc: back
the way everybody *else* on the list wanted those to headers to behave. Kind
of hard to motivate myself to write Tk/Tcl code that will just mean a *worse*
user experience for myself...


pgpi19QcM4Mi5.pgp
Description: PGP signature


Re: usb-scanner-cameras kernel-2.6.22 and udev-095 problem

2007-06-05 Thread art

sorry for not responding i was busy (i updated my sys to fc7)
on fc7-i386 and on fc-x86-64(2.6.22-rc4-cfq7 SMP PREEMPT x86_64  
GNU/Linux) i can't see usb scanner

i test this with hp-6300 and aqfa snapscan-1212u
non of them triger creation /dev/scanner-x device
xsane cannot see this scanner without this i assume.

(btw for fc7 guys scons is not signed and test-mpeg2 is not working  
with raw1394)


xboom
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] Char: stallion, don't fail with less than max panels

2007-06-05 Thread Andrew Morton
On Tue,  5 Jun 2007 23:20:57 +0200 (CEST)
Jiri Slaby <[EMAIL PROTECTED]> wrote:

> stallion, don't fail with less than max panels
>

Why not?

What problem is this patch fixing, and how does it fix it?

What are the consequences of not having this patch in the kernel?

etc.  More complete changlogs, please.

> 
> diff --git a/drivers/char/stallion.c b/drivers/char/stallion.c
> index e45113a..265abad 100644
> --- a/drivers/char/stallion.c
> +++ b/drivers/char/stallion.c
> @@ -2172,7 +2172,7 @@ static int __devinit stl_initech(struct stlbrd *brdp)
>   }
>   status = inb(ioaddr + ECH_PNLSTATUS);
>   if ((status & ECH_PNLIDMASK) != nxtid)
> - goto err_fr;
> + break;
>   panelp = kzalloc(sizeof(struct stlpanel), GFP_KERNEL);
>   if (!panelp) {
>   printk("STALLION: failed to allocate memory "
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes

2007-06-05 Thread Nicholas Miell
On Tue, 2007-06-05 at 17:11 -0700, Davide Libenzi wrote:
> On Tue, 5 Jun 2007, Nicholas Miell wrote:
> 
> > Yes, that's certainly wrong, but that's an implementation issue. I was
> > more concerned about the design of the API.
> > 
> > Naively, I would expect a reads on a signalfd to return either process
> > signals or thread signals targeted towards the thread doing the read.
> > 
> > What it actually does (delivering process signals or thread signals
> > targeted towards the thread that created the signalfd) is weird.
> > 
> > For one, it means you can't create a single signalfd, stick it in an
> > epoll set, and then wait on that set from multiple threads.
> 
> In your box threads do share the sighand, don't they? :)
> 

I have no idea what you're trying to say, but it doesn't appear to
address the issue I raise.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes

2007-06-05 Thread Davide Libenzi
On Wed, 6 Jun 2007, Benjamin Herrenschmidt wrote:

> On Tue, 2007-06-05 at 15:50 -0700, Davide Libenzi wrote:
> > > What about the code in __dequeue_signal though ? That notifier thing
> > is
> > > used by the DRI though I'm not sure what would happen if it acts on
> > the
> > > wrong task.
> > 
> > Hmm, looking at the comments in block_all_signals(), it seems that
> > they're 
> > interested in the fact that a specific task dequeue the signal. So,
> > at 
> > a first sight, it seems that such code should not not be executed if 
> > another task dequeue the message. What do you think?
> 
> Yes, I think the idea is that the DRM uses that to prevent signals to be
> delivered to the task that is blocking them with the notifier (I have no
> idea why they can't use the normal block mecanism for that... looks like
> a hack to me).
> 
> So I suppose it's fine, as long as you add a test of tsk == current to
> avoid calling it.

Are you going patchwise, or should I do it?


- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes

2007-06-05 Thread Davide Libenzi
On Tue, 5 Jun 2007, Nicholas Miell wrote:

> Yes, that's certainly wrong, but that's an implementation issue. I was
> more concerned about the design of the API.
> 
> Naively, I would expect a reads on a signalfd to return either process
> signals or thread signals targeted towards the thread doing the read.
> 
> What it actually does (delivering process signals or thread signals
> targeted towards the thread that created the signalfd) is weird.
> 
> For one, it means you can't create a single signalfd, stick it in an
> epoll set, and then wait on that set from multiple threads.

In your box threads do share the sighand, don't they? :)



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] lib: Replace calls to __get_free_pages() with __get_dma_pages().

2007-06-05 Thread Andrew Morton
On Tue, 5 Jun 2007 16:58:57 -0400 (EDT)
"Robert P. J. Day" <[EMAIL PROTECTED]> wrote:

> Replace a couple calls to __get_free_pages() with the corresponding
> calls to __get_dma_pages().
> 
> Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]>
> 
> ---
> 
>   that's the lot of them.
> 
> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> index 10c13ad..8fc38dc 100644
> --- a/lib/swiotlb.c
> +++ b/lib/swiotlb.c
> @@ -201,8 +201,7 @@ swiotlb_late_init_with_default_size(size_t default_size)
>   bytes = io_tlb_nslabs << IO_TLB_SHIFT;
> 
>   while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) {
> - io_tlb_start = (char *)__get_free_pages(GFP_DMA | __GFP_NOWARN,
> - order);
> + io_tlb_start = (char *)__get_dma_pages(__GFP_NOWARN, order);

__get_dma_pages() is just pointless obfuscation.  I think it'd be better to
go the other way: open-code the GFP_DMA at all callsites then send
__get_dma_pages() bitbucketwards.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes

2007-06-05 Thread Benjamin Herrenschmidt
On Tue, 2007-06-05 at 16:51 -0700, Nicholas Miell wrote:
> Yes, that's certainly wrong, but that's an implementation issue. I was
> more concerned about the design of the API.
> 
> Naively, I would expect a reads on a signalfd to return either process
> signals or thread signals targeted towards the thread doing the read.
> 
> What it actually does (delivering process signals or thread signals
> targeted towards the thread that created the signalfd) is weird.
> 
> For one, it means you can't create a single signalfd, stick it in an
> epoll set, and then wait on that set from multiple threads. 

Heh, well, I'll let you discuss that apsect with Davide. Right now, I'm
just trying to make sure that the implementation in the kernel copes :-)

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] UML - Fix kernel stack size on x86_64

2007-06-05 Thread Andrew Morton
On Tue, 5 Jun 2007 16:50:55 -0400
Jeff Dike <[EMAIL PROTECTED]> wrote:

> [ This is 2.6.22 material ]
> 
> Having KERNEL_STACK_ORDER in defconfig overrides the value provided by
> Kconfig, breaking UML/x86_64, which wants 2 page stacks.
> 
> Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
> --
>  arch/um/defconfig |1 -
>  1 file changed, 1 deletion(-)
> 
> Index: linux-2.6.21-mm/arch/um/defconfig
> ===
> --- linux-2.6.21-mm.orig/arch/um/defconfig2007-06-05 12:18:35.0 
> -0400
> +++ linux-2.6.21-mm/arch/um/defconfig 2007-06-05 12:19:12.0 -0400
> @@ -86,7 +86,6 @@ CONFIG_MCONSOLE=y
>  # CONFIG_MAGIC_SYSRQ is not set
>  CONFIG_NEST_LEVEL=0
>  # CONFIG_HIGHMEM is not set
> -CONFIG_KERNEL_STACK_ORDER=0
>  CONFIG_UML_REAL_TIME_CLOCK=y
>  

That means the Kconfig rules are wrong, surely?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sundance: PHY address form 0, only for device ID 0x0200 (IP100A) (20070605)

2007-06-05 Thread Samir Bellabes
Jesse Huang <[EMAIL PROTECTED]> writes:

Hi Jesse,

> - for (phy = 1; phy <= 32 && phy_idx < MII_CNT; phy++) {
> + if(sundance_pci_tbl[np->chip_id].device == 0x0200) 
> + phy = 0;
> + else 
> + phy = 1;
> + for (; phy <= 32 && phy_idx < MII_CNT; phy++) {

I think this value can be put in driver_data.
Attached patch is doing it, but I didn't test it.

tree 602e0c2def631e82635b4f8aad762e69184af143
parent 5ecd3100e695228ac5e0ce0e325e252c0f11806f
author Samir Bellabes <[EMAIL PROTECTED]> 1181086775 +0200
committer Samir Bellabes <[EMAIL PROTECTED]> 1181086775 +0200

Search PHY address form 0, only for device ID 0x0200 (IP100A). Other
device are from PHY address 1.

Noticed by Jesse Huang <[EMAIL PROTECTED]>

Signed-off-by: Samir Bellabes <[EMAIL PROTECTED]>

--

 sundance.c |   40 +++-
 1 files changed, 31 insertions(+), 9 deletions(-)

--

diff --git a/drivers/net/sundance.c b/drivers/net/sundance.c
index e1f912d..fb59801 100644
--- a/drivers/net/sundance.c
+++ b/drivers/net/sundance.c
@@ -205,14 +205,36 @@ #ifndef CONFIG_SUNDANCE_MMIO
 #define USE_IO_OPS 1
 #endif
 
+enum cfg_version {
+   SUNDANCE_CFG_0 = 0x00,  
+   SUNDANCE_CFG_1,
+   SUNDANCE_CFG_2,
+   SUNDANCE_CFG_3,
+   SUNDANCE_CFG_4,
+   SUNDANCE_CFG_5,
+   SUNDANCE_CFG_6
+};
+
+static const struct {
+   unsigned int phy;
+} sundance_cfg_info[] = {
+   [SUNDANCE_CFG_0] = { 1 },
+   [SUNDANCE_CFG_1] = { 1 },
+   [SUNDANCE_CFG_2] = { 1 },
+   [SUNDANCE_CFG_3] = { 1 },
+   [SUNDANCE_CFG_4] = { 1 },
+   [SUNDANCE_CFG_5] = { 1 },
+   [SUNDANCE_CFG_6] = { 0 }
+};
+
 static const struct pci_device_id sundance_pci_tbl[] = {
-   { 0x1186, 0x1002, 0x1186, 0x1002, 0, 0, 0 },
-   { 0x1186, 0x1002, 0x1186, 0x1003, 0, 0, 1 },
-   { 0x1186, 0x1002, 0x1186, 0x1012, 0, 0, 2 },
-   { 0x1186, 0x1002, 0x1186, 0x1040, 0, 0, 3 },
-   { 0x1186, 0x1002, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 4 },
-   { 0x13F0, 0x0201, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 5 },
-   { 0x13F0, 0x0200, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 6 },
+   { 0x1186, 0x1002, 0x1186, 0x1002, 0, 0, SUNDANCE_CFG_0 },
+   { 0x1186, 0x1002, 0x1186, 0x1003, 0, 0, SUNDANCE_CFG_1 },
+   { 0x1186, 0x1002, 0x1186, 0x1012, 0, 0, SUNDANCE_CFG_2 },
+   { 0x1186, 0x1002, 0x1186, 0x1040, 0, 0, SUNDANCE_CFG_3 },
+   { 0x1186, 0x1002, PCI_ANY_ID, PCI_ANY_ID, 0, 0, SUNDANCE_CFG_4 },
+   { 0x13F0, 0x0201, PCI_ANY_ID, PCI_ANY_ID, 0, 0, SUNDANCE_CFG_5 },
+   { 0x13F0, 0x0200, PCI_ANY_ID, PCI_ANY_ID, 0, 0, SUNDANCE_CFG_6 },
{ }
 };
 MODULE_DEVICE_TABLE(pci, sundance_pci_tbl);
@@ -468,7 +490,7 @@ #else
int bar = 1;
 #endif
int phy, phy_idx = 0;
-
+   const unsigned int phy_start = sundance_cfg_info[ent->driver_data].phy;
 
 /* when built into the kernel, we only print version if device is found */
 #ifndef MODULE
@@ -562,7 +584,7 @@ #endif
 * It seems some phys doesn't deal well with address 0 being accessed
 * first, so leave address zero to the end of the loop (32 & 31).
 */
-   for (phy = 1; phy <= 32 && phy_idx < MII_CNT; phy++) {
+   for (phy = phy_start; phy <= 32 && phy_idx < MII_CNT; phy++) {
int phyx = phy & 0x1f;
int mii_status = mdio_read(dev, phyx, MII_BMSR);
if (mii_status != 0x  &&  mii_status != 0x) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] cpusets: do not allow TIF_MEMDIE tasks to allocate globally

2007-06-05 Thread Christoph Lameter
On Tue, 5 Jun 2007, David Rientjes wrote:

> If that fails, we can't allocate elsewhere because then we have taken 
> exclusive memory from other applications and is contrary to the definition 
> of mem_exclusive.  You need to construct your cpuset hierarchy with these 
> scenarios in mind; when you ask for an exclusive cpuset, it shouldn't come 
> with a disclaimer that says "if another cpuset that is also exclusive 
> happens to OOM, we'll steal your memory anyway and it's not our problem if 
> the dying task gets stuck in D state and doesn't exit synchronously or 
> reliably because all we did was send it a SIGKILL."

Exclusive is not as absolute as you may think. There is also the 
GFP_KERNEL exception.

Processes stuck in D state is another issue with reliability.

> > So its seems that the patch is addressing an imagined situation?
> No, it's returning us to the previous logic where an exclusive cpuset was 
> actually exclusive.
> 
> And, again, without this change it is possible to allocate in other 
> exclusive cpusets without first exhausting your own memory reserves.  
> That's wrong.

That is already occurring with GFP_KERNEL. So your patch really does not 
have the purifying effect on exclusivity that you expect. This looks all 
more like hunting for elusive idealistic cpuset behavior to me.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] cpusets: do not allow TIF_MEMDIE tasks to allocate globally

2007-06-05 Thread Paul Jackson
> If that fails, we can't allocate elsewhere because then we have taken 
> exclusive memory from other applications and is contrary to the definition 
> of mem_exclusive. 

Well, I can't speak to the 'real' meaning of TIF_MEMDIE with authority,
but I can speak to the meaning of cpuset flags.

The mem_exclusive flag doesn't mean this.

It means that you cannot overlap the memory of a sibling cpuset.

You will, necessarily, still overlap the memory of your ancestor cpusets.

Whether or not you make any use of the mem_exclusive flag, you still
get the same (limited) guarantees of memory usage -- namely that your
memory won't be used by tasks in non-overlapping cpusets, with some
exceptions, such as:
 1) memory handed out to interrupt code,
 2) memory handed out for GFP_ATOMIC requests, and
 3) tasks marked PF_EXITING -- will soon free up memory

Tasks in cpusets ancestor to your tasks cpuset can always, easily,
use memory on the same nodes your task is on.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Darrick J. Wong
On Tue, Jun 05, 2007 at 02:14:51PM -0700, Siddha, Suresh B wrote:
 
> Can you send us your system's dmesg aswell as output of /proc/interrupts?

http://sweaglesw.net/~djwong/docs/dmesg
http://sweaglesw.net/~djwong/docs/interrupts

--D


signature.asc
Description: Digital signature


Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes

2007-06-05 Thread Nicholas Miell
On Tue, 2007-06-05 at 17:27 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2007-06-04 at 23:09 -0700, Nicholas Miell wrote:
> > signalfd() doesn't deliver thread-targeted signals to the wrong
> > threads,
> > does it?
> > 
> > Hmm.
> > 
> > It looks like reading from a signalfd will give you either
> > process-global signals or the thread-specific signals that are
> > targeted
> > towards the thread that originally created the signalfd (regardless of
> > which thread actually calls read()).
> > 
> > Which is weird, to say the least. Definitely needs to be noted in the
> > man page, which doesn't seem to exist yet.
> > 
> > Is there a reason why signalfd() doesn't behave like regular signals
> > in
> > this regard? 
> 
> It's worse than that ... by being able to call dequeue_signal from the
> contxt of another thread than the one dequeuing from.
> 
> Ben.

Yes, that's certainly wrong, but that's an implementation issue. I was
more concerned about the design of the API.

Naively, I would expect a reads on a signalfd to return either process
signals or thread signals targeted towards the thread doing the read.

What it actually does (delivering process signals or thread signals
targeted towards the thread that created the signalfd) is weird.

For one, it means you can't create a single signalfd, stick it in an
epoll set, and then wait on that set from multiple threads.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] cpusets: do not allow TIF_MEMDIE tasks to allocate globally

2007-06-05 Thread David Rientjes
On Tue, 5 Jun 2007, Christoph Lameter wrote:

> But with the patch the process would be able to terminate. There is no 
> global OOM situation. If there would be a global OOM situation then 
> TIF_MEMDIE would not help.
> 

Sure it would, it would have access to memory reserves because of the 
change in watermarks through get_page_from_freelist().

If that fails, we can't allocate elsewhere because then we have taken 
exclusive memory from other applications and is contrary to the definition 
of mem_exclusive.  You need to construct your cpuset hierarchy with these 
scenarios in mind; when you ask for an exclusive cpuset, it shouldn't come 
with a disclaimer that says "if another cpuset that is also exclusive 
happens to OOM, we'll steal your memory anyway and it's not our problem if 
the dying task gets stuck in D state and doesn't exit synchronously or 
reliably because all we did was send it a SIGKILL."

> So its seems that the patch is addressing an imagined situation?
> 

No, it's returning us to the previous logic where an exclusive cpuset was 
actually exclusive.

And, again, without this change it is possible to allocate in other 
exclusive cpusets without first exhausting your own memory reserves.  
That's wrong.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [stable] "[IPV6]: Fix routing round-robin locking." breaks manual default route (bug 8349)

2007-06-05 Thread David Miller
From: Chris Wright <[EMAIL PROTECTED]>
Date: Tue, 5 Jun 2007 16:36:16 -0700

> Rather than reverting that patch, applying this patch should fix
> your ipv6 issue:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=7ebba6d14f8d63cad583bf1cc0330b601d5a8171
> 
> I'll wait for Dave or Yoshifuji to decide if this is a proper -stable
> patch.

I have a batch of stuff to send to -stable tonight and I
will make sure that this will be part of it.

Thanks for reminding me.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] mm: variable length argument support

2007-06-05 Thread Andrew Morton
On Tue, 05 Jun 2007 17:05:27 +0200
Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> From: Ollie Wild <[EMAIL PROTECTED]>
> 
> Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly
> from the old mm into the new mm.
> 
> We create the new mm before the binfmt code runs, and place the new stack
> at the very top of the address space. Once the binfmt code runs and figures
> out where the stack should be, we move it downwards.
> 
> It is a bit peculiar in that we have one task with two mm's, one of which is
> inactive.
> 
> ..
>
> 
> Index: linux-2.6-2/fs/binfmt_elf.c
> ===
> --- linux-2.6-2.orig/fs/binfmt_elf.c  2007-06-05 16:23:16.0 +0200
> +++ linux-2.6-2/fs/binfmt_elf.c   2007-06-05 16:29:45.0 +0200
> @@ -148,6 +148,7 @@ create_elf_tables(struct linux_binprm *b
>   elf_addr_t *elf_info;
>   int ei_index = 0;
>   struct task_struct *tsk = current;
> + struct vm_area_struct *vma;
>  
>   /*
>* If this architecture has a platform capability string, copy it
> @@ -234,6 +235,15 @@ create_elf_tables(struct linux_binprm *b
>   sp = (elf_addr_t __user *)bprm->p;
>  #endif
>  
> +
> + /*
> +  * Grow the stack manually; some architectures have a limit on how
> +  * far ahead a user-space access may be in order to grow the stack.
> +  */
> + vma = find_extend_vma(current->mm, bprm->p);
> + if (!vma)
> + return -EFAULT;
> +
>   /* Now, let's put argc (and argv, envp if appropriate) on the stack */
>   if (__put_user(argc, sp++))
>   return -EFAULT;
> @@ -254,8 +264,8 @@ create_elf_tables(struct linux_binprm *b
>   size_t len;
>   if (__put_user((elf_addr_t)p, argv++))
>   return -EFAULT;
> - len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES);
> - if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
> + len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
> + if (!len || len > MAX_ARG_STRLEN)

strnlen_user() is a scary function.  Please do remember that if the memory
we just strlen'ed is writeable by any user thread then that thread can at
any time invalidate the number which the kernel now holds.

>   return 0;
>   p += len;
>   }
> @@ -266,8 +276,8 @@ create_elf_tables(struct linux_binprm *b
>   size_t len;
>   if (__put_user((elf_addr_t)p, envp++))
>   return -EFAULT;
> - len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES);
> - if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
> + len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
> + if (!len || len > MAX_ARG_STRLEN)
>   return 0;
>   p += len;
>   }
>
> ...
>
> Index: linux-2.6-2/fs/compat.c
> ===
> --- linux-2.6-2.orig/fs/compat.c  2007-06-05 16:23:16.0 +0200
> +++ linux-2.6-2/fs/compat.c   2007-06-05 16:29:45.0 +0200
> @@ -1257,6 +1257,7 @@ static int compat_copy_strings(int argc,
>  {
>   struct page *kmapped_page = NULL;
>   char *kaddr = NULL;
> + unsigned long kpos = 0;
>   int ret;
>  
>   while (argc-- > 0) {
> @@ -1265,92 +1266,84 @@ static int compat_copy_strings(int argc,
>   unsigned long pos;
>  
>   if (get_user(str, argv+argc) ||
> - !(len = strnlen_user(compat_ptr(str), bprm->p))) {
> + !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) {
>   ret = -EFAULT;
>   goto out;
>   }
>  
> - if (bprm->p < len)  {
> + if (MAX_ARG_STRLEN < len) {
>   ret = -E2BIG;
>   goto out;
>   }

Do we have an off-by-one here?  Should it be <=?

If not, then this code is relying upon the string's terminating \0 coming
from userspace?  If so, that's buggy: userspace can overwrite the \0 after
we ran the strnlen_user(), perhaps, and confound the kernel?

I could be complete crap, but please check all this very closely.


> +/*
> + * Create a new mm_struct and populate it with a temporary stack
> + * vm_area_struct.  We don't have enough context at this point to set the 
> stack
> + * flags, permissions, and offset, so we use temporary values.  We'll update
> + * them later in setup_arg_pages().
> + */
> +int bprm_mm_init(struct linux_binprm *bprm)
> +{
> + int err;
> + struct mm_struct *mm = NULL;
> + struct vm_area_struct *vma = NULL;
> +
> + bprm->mm = mm = mm_alloc();
> + err = -ENOMEM;
> + if (!mm)
> + goto err;
> +
> + if ((err = init_new_context(current, mm)))
> + goto err;

err = init_new_context(current, mm));
if (err)
goto err;

> +#ifdef 

Re: [PATCH 3/4] mm: move_page_tables{,_up}

2007-06-05 Thread Andrew Morton
On Tue, 05 Jun 2007 17:05:26 +0200
Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> Provide functions for moving page tables upwards.
> 
> ...
>
> +extern unsigned long move_page_tables(struct vm_area_struct *vma,
> + unsigned long old_addr, struct vm_area_struct *new_vma,
> + unsigned long new_addr, unsigned long len);
> +extern unsigned long move_page_tables_up(struct vm_area_struct *vma,
> + unsigned long old_addr, struct vm_area_struct *new_vma,
> + unsigned long new_addr, unsigned long len);
>  extern unsigned long do_mremap(unsigned long addr,
>  unsigned long old_len, unsigned long new_len,
>  unsigned long flags, unsigned long new_addr);

They become kernel-wide

> +static void move_ptes_up(struct vm_area_struct *vma, pmd_t *old_pmd,
> + unsigned long old_addr, unsigned long old_end,
> + struct vm_area_struct *new_vma, pmd_t *new_pmd,
> + unsigned long new_addr)

So some documentation might be in order...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] audit: rework execve audit

2007-06-05 Thread Andrew Morton
On Tue, 05 Jun 2007 17:05:25 +0200
Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> The purpose of audit_bprm() is to log the argv array to a userspace daemon at
> the end of the execve system call. Since user-space hasn't had time to run,
> this array is still in pristine state on the process' stack; so no need to 
> copy
> it, we can just grab it from there.
> 
> In order to minimize the damage to audit_log_*() copy each string into a
> temporary kernel buffer first.
> 
> Currently the audit code requires that the full argument vector fits in a
> single packet. So currently it does clip the argv size to a (sysctl) limit, 
> but
> only when execve auditing is enabled.
> 
> If the audit protocol gets extended to allow for multiple packets this check
> can be removed.
> 
> ...
>  

Please try to avoid trigger-happiness with the BUG_ON()s..

>  struct audit_aux_data_socketcall {
> @@ -834,6 +834,47 @@ static int audit_log_pid_context(struct 
>   return rc;
>  }
>  
> +static void audit_log_execve_info(struct audit_buffer *ab,
> + struct audit_aux_data_execve *axi)
> +{
> + int i;
> + long len;
> + const char __user *p = (const char __user *)axi->mm->arg_start;
> +
> + if (axi->mm != current->mm)
> + return; /* execve failed, no additional info */
> +
> + for (i = 0; i < axi->argc; i++, p += len) {
> + long ret;
> + char *tmp;
> +
> + len = strnlen_user(p, MAX_ARG_PAGES*PAGE_SIZE);
> + /*
> +  * We just created this mm, if we can't find the strings
> +  * we just copied in something is _very_ wrong.
> +  */
> + BUG_ON(!len);
> +
> + tmp = kmalloc(len, GFP_KERNEL);
> + if (!tmp) {
> + audit_panic("out of memory for argv string\n");
> + break;
> + }
> +
> + ret = copy_from_user(tmp, p, len);
> + /*
> +  * There is no reason for this copy to be short.
> +  */
> + BUG_ON(ret);

You sure?  What happens if another thread does munmap() in parallel?

I think I'll make this WARN_ON just out of principle.

> + audit_log_format(ab, "a%d=", i);
> + audit_log_untrustedstring(ab, tmp);
> + audit_log_format(ab, "\n");
> +
> + kfree(tmp);
> + }
> +}
> +
>
> ...
>
> ===
> --- linux-2.6-2.orig/fs/exec.c2007-06-05 09:51:42.0 +0200
> +++ linux-2.6-2/fs/exec.c 2007-06-05 10:03:11.0 +0200
> @@ -1154,6 +1154,7 @@ int do_execve(char * filename,
>  {
>   struct linux_binprm *bprm;
>   struct file *file;
> + unsigned long tmp;
>   int retval;
>   int i;
>  
> @@ -1208,9 +1209,11 @@ int do_execve(char * filename,
>   if (retval < 0)
>   goto out;
>  
> + tmp = bprm->p;
>   retval = copy_strings(bprm->argc, argv, bprm);
>   if (retval < 0)
>   goto out;
> + bprm->argv_len = tmp - bprm->p;





--- a/include/linux/kernel.h~a
+++ a/include/linux/kernel.h
@@ -5,6 +5,8 @@
  * 'kernel.h' contains some often-used function prototypes etc
  */
 
+#define tmp don't call your variables tmp!
+
 #ifdef __KERNEL__
 
 #include 
_
  



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc4 - sata_promise regression since -rc3

2007-06-05 Thread walt

Jeff Garzik wrote:

On Tue, Jun 05, 2007 at 11:31:46PM +0200, Mikael Pettersson wrote:

I can easily reproduce the problem in 2.6.22-rc4. There are no
sata_promise changes between rc3 and rc4, but Tejun's libata
polling SETXFER change was included in rc4. Reverting it makes
sata_promise work again for me.


Ugh.


Reverting Tejun's patch also fixed my boot failure.

00:08.0 RAID bus controller: Promise Technology, Inc. PDC20376 (FastTrak 
376) (rev 02)

Subsystem: ASUSTeK Computer Inc. A7V8X motherboard
Flags: bus master, 66MHz, medium devsel, latency 96, IRQ 16
I/O ports at d400 [size=64]
I/O ports at d000 [size=16]
I/O ports at b800 [size=128]
Memory at f680 (32-bit, non-prefetchable) [size=4K]
Memory at f600 (32-bit, non-prefetchable) [size=128K]
Capabilities: [60] Power Management version 2

00:11.1 IDE interface: VIA Technologies, Inc. 
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) (prog-if

 8a [Master SecP PriP])
Subsystem: ASUSTeK Computer Inc. A7V8X / A7V333 motherboard
Flags: bus master, medium devsel, latency 32, IRQ 255
[virtual] Memory at 01f0 (32-bit, non-prefetchable) [size=8]
[virtual] Memory at 03f0 (type 3, non-prefetchable) [size=1]
[virtual] Memory at 0170 (32-bit, non-prefetchable) [size=8]
[virtual] Memory at 0370 (type 3, non-prefetchable) [size=1]
I/O ports at a400 [size=16]
Capabilities: [c0] Power Management version 2

The problem is that the controller can't be initialized properly, so
the kernel keeps trying every 5 seconds.  I gave up after about five
failures.  If the actual boot error messages would help I'll need to
copy them by hand -- just say the word.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [stable] "[IPV6]: Fix routing round-robin locking." breaks manual default route (bug 8349)

2007-06-05 Thread Chris Wright
* Simon Arlott ([EMAIL PROTECTED]) wrote:
> Adding a ::/0 route doesn't work:
> # ip -6 r a ::/0 via fe80::230:18ff:feb0:25c2 dev eth0
> # ping6 -c 1 2001:4b10:1005:0:205:b4ff:fe12:530
> connect: Network is unreachable
> 
> A route assigned by addrconf works.
> 
> Reverting this patch from 2.6.22-rc3 fixes it:
>   commit f11e6659ce9058928d73ff440f9b40a818d628ab
>   Author: David S. Miller <[EMAIL PROTECTED]>
>   Date:   Sat Mar 24 20:36:25 2007 -0700
>   [IPV6]: Fix routing round-robin locking.
> 
> This patch is was added to 2.6.20.5, breaking -stable too.

Rather than reverting that patch, applying this patch should fix
your ipv6 issue:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=7ebba6d14f8d63cad583bf1cc0330b601d5a8171

I'll wait for Dave or Yoshifuji to decide if this is a proper -stable
patch.

thanks,
-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] cpusets: do not allow TIF_MEMDIE tasks to allocate globally

2007-06-05 Thread Christoph Lameter
On Tue, 5 Jun 2007, David Rientjes wrote:

> > But the alternative is that the exiting process does not save its 
> > data.
> The same condition that occurs when there is a system-wide OOM, yes.  
> Exclusive cpusets cannot be violated for such allocations outside of the 
> obvious GFP_ATOMIC exception.

But with the patch the process would be able to terminate. There is no 
global OOM situation. If there would be a global OOM situation then 
TIF_MEMDIE would not help.

> > What is this very small exclusive cpuset?
> That's arbitrary.  The idea is that an exclusive cpuset should not 
> encounter memory pressure because another exclusive cpuset encountered an 
> OOM condition because its zones happened to be higher on the zonelist.  
> Notice how, without this change, it's possible to allocate on a node 
> outside our mems_allowed before we use our own memory reserves.

So its seems that the patch is addressing an imagined situation?

I think the allocation outside of our mems_allowed is fine when it serves 
to terminate the process and thereby release resources. It is certainly 
better than having the process corrupt data by only partially writing back 
its data.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] cpusets: do not allow TIF_MEMDIE tasks to allocate globally

2007-06-05 Thread David Rientjes
On Tue, 5 Jun 2007, Christoph Lameter wrote:

> But the alternative is that the exiting process does not save its 
> data.
> 

The same condition that occurs when there is a system-wide OOM, yes.  
Exclusive cpusets cannot be violated for such allocations outside of the 
obvious GFP_ATOMIC exception.

> What is this very small exclusive cpuset?
> 

That's arbitrary.  The idea is that an exclusive cpuset should not 
encounter memory pressure because another exclusive cpuset encountered an 
OOM condition because its zones happened to be higher on the zonelist.  
Notice how, without this change, it's possible to allocate on a node 
outside our mems_allowed before we use our own memory reserves.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc: regression: no irda0 interface (2.6.21 was OK), smsc does not find chip

2007-06-05 Thread Bjorn Helgaas
On Tuesday 05 June 2007 05:57:30 am Linus Walleij (LD/EAB) wrote:
> You don't need to alter the defaults for the Toshiba ALi, the 
> preconfigure will respect the settings from the commandline,
> e.g. modprobe smsc-ircc2 ircc_fir=0x100,ircc_sir=0x02e8.
> 
> BUT this value just won't work: we don't know how to tell the ALi 1533
> to use any other ports than 0x130,0x178,0x03f8,0x02f8 or 0x02e8.

Something's wrong with this strategy.  The BIOS is telling us that an
SMCf010 device is present, active, and responds at io ports 0x100-0x107
and 0x2e8-0x2ef.  The fact that it happens to be on the other side of
an ISA or LPC bridge should be immaterial to the OS driver.

If an ACPI BIOS says the device is active, I don't think the OS should
have to preconfigure anything to make it work.  I don't know whether
this is just a broken BIOS on this machine, or whether we don't know
how to use it correctly yet.  The fact that we *do* have to preconfigure
so much stuff in smsc-ircc2.c makes me think that Linux is missing
something important in the way we deal with ISA and LPC bridges.

Andrey, can you collect your ACPI DSDT and "lspnp -vv" [1] output?
Maybe there will be a clue there.

[1] ftp://ftp.kernel.org/pub/linux/kernel/people/helgaas/pnputils-0.1.tar.bz2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] cpusets: do not allow TIF_MEMDIE tasks to allocate globally

2007-06-05 Thread Paul Jackson
> The intended purpose of TIF_MEMDIE was to allocate pages without being 

Ok then ... you probably right.  I'll stand down.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] cpusets: do not allow TIF_MEMDIE tasks to allocate globally

2007-06-05 Thread Christoph Lameter
On Tue, 5 Jun 2007, David Rientjes wrote:

> No, it means that it can allocate anywhere based on the zonelist ordering 
> and then can OOM a very small exclusive cpuset that would never have had 
> any memory pressure if it wasn't violated.

But the alternative is that the exiting process does not save its 
data.

What is this very small exclusive cpuset?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] cpusets: do not allow TIF_MEMDIE tasks to allocate globally

2007-06-05 Thread David Rientjes
On Tue, 5 Jun 2007, Paul Jackson wrote:

> I'm a little surprised at this suggested change -- I'd have thought
> that it was a good idea to let tasks marked for extinction get memory
> anywhere, as they were going to use that memory to exit, and free up
> lots more memory.
> 

The intended purpose of TIF_MEMDIE was to allocate pages without being 
bound by the watermarks so that they have access to memory reserves on the 
per-zone level.  If the cpuset doesn't have access to a zone, whether it's 
memory reserve or not, it shouldn't allocate there.

> I'm pretty sure we have this same policy in other places in the
> kernel, besides cpusets.  Did you intend to change them too?
> 

You'd have to cite them first.

> If a MEMDIE task is taking enough memory to OOM other tasks anywhere
> in the system, then doesn't that mean your entire system was in deep
> yogurt, and we're just haggling over who to blame for the upcoming
> crash?
> 

No, it means that it can allocate anywhere based on the zonelist ordering 
and then can OOM a very small exclusive cpuset that would never have had 
any memory pressure if it wasn't violated.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc4: known regressions [section mismatch]

2007-06-05 Thread Jeff Chua

On 6/2/07, Michal Piotrowski <[EMAIL PROTECTED]> wrote:


I think that Sam is working on this.
http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git=search=f285e3d329ce68cc355fadf4ab2c8f34d7f264cb=commit=section+mismatch


I still got the section mismatch errors on 2.6.22-rc4. Here's the details ...

WARNING: arch/i386/kernel/built-in.o(.text+0x9c7f): Section mismatch:
reference to .init.text: (between 'init_intel_cacheinfo' and
'cache_shared_cpu_map_setup')
WARNING: arch/i386/kernel/built-in.o(.text+0xae43): Section mismatch:
reference to .init.text: (between 'mtrr_bp_init' and 'mtrr_ap_init')
WARNING: arch/i386/kernel/built-in.o(.text+0xafb1): Section mismatch:
reference to .init.text: (between 'mtrr_bp_init' and 'mtrr_ap_init')
WARNING: arch/i386/kernel/built-in.o(.text+0xafb6): Section mismatch:
reference to .init.text: (between 'mtrr_bp_init' and 'mtrr_ap_init')
WARNING: arch/i386/kernel/built-in.o(.text+0xb97d): Section mismatch:
reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: kernel/built-in.o(.text+0x17a93): Section mismatch: reference
to .init.text: (between 'kthreadd' and 'init_waitqueue_head')


Only happens after recompile.

Thanks,
Jeff.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/2] 2.6.22-rc4: known regressions with patches

2007-06-05 Thread David Miller
From: Michal Piotrowski <[EMAIL PROTECTED]>
Date: Tue, 05 Jun 2007 16:54:28 +0200

> Networking
> 
> Subject: OOPS iproute2/tc/u32_destroy in 2.6.22-rc3-git6
> References : http://lkml.org/lkml/2007/6/3/66
> Submitter  : Strobl Anton <[EMAIL PROTECTED]>
> Handled-By : Patrick McHardy <[EMAIL PROTECTED]>
> Patch  : http://lkml.org/lkml/2007/6/3/137
> Status : patch available

This slipped through the cracks, I've applied Patrick's
patch and will push upstream.

Thanks!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] cpusets: do not allow TIF_MEMDIE tasks to allocate globally

2007-06-05 Thread Paul Jackson
> OOM-killed tasks, marked as TIF_MEMDIE, should not be able to access 
> memory outside its cpuset because it could potentially cause other 
> exclusive cpusets to OOM themselves.

I'm a little surprised at this suggested change -- I'd have thought
that it was a good idea to let tasks marked for extinction get memory
anywhere, as they were going to use that memory to exit, and free up
lots more memory.

I'm pretty sure we have this same policy in other places in the
kernel, besides cpusets.  Did you intend to change them too?

If a MEMDIE task is taking enough memory to OOM other tasks anywhere
in the system, then doesn't that mean your entire system was in deep
yogurt, and we're just haggling over who to blame for the upcoming
crash?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH TRIVIAL] icom whitespace cleanups

2007-06-05 Thread Chris Snook

Paul Mackerras wrote:

Chris Snook writes:


Clean up whitespace and comments in drivers/serial/icom.c


These changes seem totally unnecessary, as the existing indentation is
according to a commonly-accepted style and is quite reasonable:


There are actually a few different indentation styles used inconsistently 
throughout, which I personally find annoying, and there's lots of whitespace 
damage.  If I've really overstepped the bounds, I'll resubmit a somewhat smaller 
patch, but I'm inclined to make it look pretty while I've got the hood up.



Also you don't seem to have cc'd the driver author.


Can't find any contact info.  Care to enlighten me?

-- Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes

2007-06-05 Thread Benjamin Herrenschmidt
On Tue, 2007-06-05 at 15:50 -0700, Davide Libenzi wrote:
> > What about the code in __dequeue_signal though ? That notifier thing
> is
> > used by the DRI though I'm not sure what would happen if it acts on
> the
> > wrong task.
> 
> Hmm, looking at the comments in block_all_signals(), it seems that
> they're 
> interested in the fact that a specific task dequeue the signal. So,
> at 
> a first sight, it seems that such code should not not be executed if 
> another task dequeue the message. What do you think?

Yes, I think the idea is that the DRM uses that to prevent signals to be
delivered to the task that is blocking them with the notifier (I have no
idea why they can't use the normal block mecanism for that... looks like
a hack to me).

So I suppose it's fine, as long as you add a test of tsk == current to
avoid calling it.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-06-05 Thread H. Peter Anvin
Andy Whitcroft wrote:
>>>
>> It definitely sounds like a memory clobber of some sort.
>>
>> Usual suspects, in addition to the input/output buffers you already
>> looked at, would be the heap and the stack.  Finding where the stack
>> pointer lives would be my first, instinctive guess.
> 
> The stack seems to be where it should be and seems to stay pretty much
> in the same place as it should.  Adding checks for the heap also seem to
> stay within bounds.  I've tried making the stack and the heap 64k to no
> effect.
> 
> Moving the kernel to other places in memory seems to kill the decode
> completely during gunzip() which may be a hint I am not sure.
> 
> This thing is trying to ruin my mind.
> 

Yours and mine both.  Seems like *something* is clobbering memory, but
what and why is a mystery.  The fact that putting the kernel in a higher
point in memory is a good indication that this clobber is at a
relatively high address.

How much RAM does this machine have?

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Protection for exploiting null dereference using mmap

2007-06-05 Thread Chris Wright
* Eric Paris ([EMAIL PROTECTED]) wrote:
> One result of using the dummy hook for non-selinux kernels means that I
> can't leave the generic module stacking code in the SELinux check.  If
> the secondary ops are called they will always deny the operation just
> like in non-selinux systems even if SELinux policy would have allowed
> the action.  This patch may be the first step to removing the arbitrary
> LSM module stacking code from SELinux.  I think history has shown the
> arbitrary module stacking is not a good idea and eventually I want to
> pull out all the secondary calls which aren't used by the capability
> module, so I view this as just the first step along those lines.

Or replace them all with direct library calls to the capability code.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   >