date:20071117

Re: [PATCHv3 0/4] sys_indirect system call

2007-11-17 Thread H. Peter Anvin


Ulrich Drepper wrote:

wing patches provide an alternative implementation of the
sys_indirect system call which has been discussed a few times.
This no system call allows us to extend existing system call
interfaces with adding more system calls.

Davide's previous implementation is IMO far more complex than
warranted.  This code here is trivial, as you can see.  I've
discussed this approach with Linus last week and for a brief moment
we actually agreed on something.

We pass an additional block of data to the kernel, it is copied into
the task_struct, and then it is up to the function implementing the system
call to interpret the data.  Each system call, which is meant to be
extended this way, has to be white-listed in sys_indirect.  The
alternative is to filter out those system calls which absolutely cannot
be handled using sys_indirect (like clone, execve) since they require
the stack layout of an ordinary system call.  This is more dangerous
since it is too easy to miss a call.



I stared at this a bit, and it took me some time to try to grok what it 
is trying to do.  Eventually I figured it out, and I wonder if there 
isn't an easier -- or at least more efficient -- way to accomplish this 
goal.


It seems to me that we could accomplish the same thing by passing the 
number of parameters in the upper bits of the system call number 
register (%eax in the case of x86.)  If set to zero, we'd fill in the 
legacy number of registers (for backwards compatibility.)  Unspecified 
arguments are then forced to zero before invoking the target function; 
we could also make a register count available if need be.


Alternatively, the same thing can be done with a dense system call 
number space by adding a number of parameters field to the system call 
table, however, that is more invasive in that one has to poke something 
into each architecture (unfortunately -- it would be so much nicer if 
there was a central metafile which one could process into the various 
architecture system call tables.)


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUG] 2.6.23-rc3 can't see sd partitions on Alpha

2007-11-17 Thread Bob Tracy

Completely reproducible... 2.6.23-rc3 kernel boots, and normal messages
are seen on console as far as disks found and partitions on each.  However,
once /dev is populated and the boottime scripts attempt to check filesystem
status, no partitions on either of the two disks attached to the SCSI
controller are seen.  Dropping into a single-user root shell confirms
the sudden "blindness": fdisk can't open /dev/sda.

When I reboot on 2.6.24-rc2, everything works normally.

System environment is Debian Etch.  Both 2.6.24-rc2 and -rc3 were built
from the respective unaltered kernel.org source trees, using the same
kernel configuration modulo saying "no" to CONFIG_SENSORS_I5K_AMB and
CONFIG_PID_NS in -rc3.  No problems with -rc3 on a x86 box.

-- 

Bob Tracy  |  "They couldn't hit an elephant at this dist- "
[EMAIL PROTECTED]   |   - Last words of Union General John Sedgwick,
   |  Battle of Spotsylvania Court House, U.S. Civil War

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC/PATCH] Optimize zone allocator synchronization

2007-11-17 Thread Don Porter

Thank you all for your consideration and insightful responses to my
posting.  I apologize for not responding sooner---I have been under a
deadline.

It seems clear that further investigation will be needed to understand
these performance numbers better.

To summarize, I understand that the following experiments will be helpful:

1) Instrument the allocation code to determine the common size/order
of the allocations for these workloads.

2) Try to integrate these changes with ticket spinlocks

3) Try placing the zone lock in its own cacheline

4) Look for single-threaded regressions (dd benchmark).

I'll do these at my first opportunity, hopefully within the next week.
Please let me know if I misunderstood any of your comments.

My intuition about the cost of ping-ponging the lock's cache line
certainly matched yours, so I was very surprised to see these
performance numbers.  

On Wed, Nov 07, 2007 at 04:31:59PM +1100, Nick Piggin wrote:
> It's funny, Dave Miller and I were just talking about the possible
> reappearance of zone->lock contention with massively multi core and
> multi threaded CPUs. I think the right way to fix this in the long run
> if it turns into a real problem, is something like having a lock per
> MAX_ORDER block, and having CPUs prefer to allocate from different
> blocks. Anti-frag makes this pretty interesting to implement, but it
> will be possible.

As a bit of background, the zone lock is indeed one of the more
contended locks in my target workloads so it was no accident that I
was looking for ways to improve its scalability.  I am quite
interested in Nick's ideas about how to split up the zone allocator's
synchronization.

Of course, these contention levels may not meet your definition of
"real problem" (~.1% of the execution time).

Best regards,
Don
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

increased number of cycles

2007-11-17 Thread kernel coder

hi,
I'm trying to add some code to netif_receive_skb function in
dev.c file . The cycles consumed by that code was around 16 cycles on
Dual Core Opetron machine.I'm working on that code for last 6 months
now and the consumed cycles have always been around 16 cycles .I don't
touch any other part of kernel .

But for last 4 days the consumed cycles have suddenly increased to
around 35 cycles . I'm using RDTSC instruction to profile the
code.There is no change in code and the kernel version is also the
same .I am assuming that there  must be something wrong with hardware.

Please guide me how can i figure out the root cause.What areas should
i look at to find out the reason for increased number of cycles.I
don't think that there is any issue in kernel because the kernel
version and code  is same. Can the the log messages during system
bootup help me to diagnose the problem


shahzad
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [REQUEST] Option for skipping unreadable blocks on Video DVD

2007-11-17 Thread Robert Hancock


Tobias wrote:
If you are accessing a scratched Video DVD and the device cannot read it, the 
process ends. 
What about a more tolerant way to handle unreadable blocks. 
Especially on Video DVDs single blocks are not that important than on data 
dvds.


If the DVD player process ends from this, I'd say that's the fault of 
the player software not handling errors properly.


I think that if they are using the normal block layer accesses on the 
DVD device, there may be some retries that occur which are likely 
undesirable in this case since they will just stall playback. If they 
are using SG_IO to feed raw requests into the drive (which I imagine 
they need to do for CSS authentication, etc. anyway), then all error 
handling is passed up to the user application.




So is there a way that the kernel tells the device to skip these bad blocks?


We don't know they're bad until we try and read them. How long the drive 
will stall trying to read that sector before giving up and returning an 
error is up to the drive. I'm not sure if the MMC command set allows any 
way to tell the drive to give up more quickly or not..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC HIFN 00/02]: RNG support

2007-11-17 Thread Herbert Xu

On Sun, Nov 18, 2007 at 12:04:01PM +0800, Herbert Xu wrote:
> On Sun, Nov 18, 2007 at 04:30:40AM +0100, Patrick McHardy wrote:
> >
> > On a related issue, I think the rng interface is not very suitable
> > for chips like HIFN that have a constant random bandwidth, it would
> > make a lot more sense to return the time to wait to the core, instead
> > of waiting 10us in all cases. 256 cycles at a speed of 266MHz comes
> > down to 0.96us, so we're waiting about 10 times as long as necessary.
> > Since its busy waiting anyway, I'd think that from a performance POV
> > constant polling or returning the exact amount of time would be more
> > reasonable.
> 
> I agree, a better interface would be to let the hardware do the
> blocking where necessary.

I meant the hardware driver of course.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC HIFN 00/02]: RNG support

2007-11-17 Thread Herbert Xu

On Sun, Nov 18, 2007 at 04:30:40AM +0100, Patrick McHardy wrote:
>
> On a related issue, I think the rng interface is not very suitable
> for chips like HIFN that have a constant random bandwidth, it would
> make a lot more sense to return the time to wait to the core, instead
> of waiting 10us in all cases. 256 cycles at a speed of 266MHz comes
> down to 0.96us, so we're waiting about 10 times as long as necessary.
> Since its busy waiting anyway, I'd think that from a performance POV
> constant polling or returning the exact amount of time would be more
> reasonable.

I agree, a better interface would be to let the hardware do the
blocking where necessary.

Michael, what do you think about this?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [TOMOYO #5 18/18] LSM expansion for TOMOYO Linux.

2007-11-17 Thread Tetsuo Handa

Hello.

Paul Moore wrote:
> Okay, well if that is the case I think you are going to have another problem 
> in that you could end up throwing away skbs that haven't been through your 
> security_post_recv_datagram() hook because you _always_ throw away the result 
> of the second skb_peek().  Once again, if I'm wrong please correct me.
I didn't understand what's wrong with throwing away the result of
the second skb_peek(). I'm doing similar things that udp_recvmsg() is doing.

| int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
| size_t len, int noblock, int flags, int *addr_len)
| {

| try_again:
| skb = skb_recv_datagram(sk, flags, noblock, );
| if (!skb)
| goto out;

| out_free:
| skb_free_datagram(sk, skb);
| out:
| return err;
| 
| csum_copy_err:
| UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite);
| 
| skb_kill_datagram(sk, skb, flags);
| 
| if (noblock)
| return -EAGAIN;
| goto try_again;
| }

The only difference is that I'm not using skb_kill_datagram()
because skb_kill_datagram() uses spin_lock_bh()
while skb_recv_datagram() needs to use spin_lock_irqsave().


> Where did the 'if (skb) return skb;' code go?  Don't you need to do you LSM 
> call before you return the skb?
Sorry, I should have explicitly inserted  rather than a blank line 
like:

|   error = security_post_recv_datagram(sk, skb, flags);
|   if (error)
|   goto force_dequeue;

| } while (!wait_for_packet(sk, err, ));
|
| return NULL;
| force_dequeue:
| /* dequeue if MSG_PEEK is set. */
| no_packet:
| *err = error;
| return NULL;

The below is the updated patch.
Regards.
-
Subject: LSM expansion for TOMOYO Linux.

LSM hooks for sending signal:
   * task_kill_unlocked is added in sys_kill
   * task_tkill_unlocked is added in sys_tkill
   * task_tgkill_unlocked is added in sys_tgkill
LSM hooks for network accept and recv:
   * socket_post_accept is modified to return int.
   * post_recv_datagram is added in skb_recv_datagram.

You can try TOMOYO Linux without this patch, but in that case, you
can't use access control functionality for restricting signal
transmission and incoming network data.

Signed-off-by: Kentaro Takeda <[EMAIL PROTECTED]>
Signed-off-by: Tetsuo Handa <[EMAIL PROTECTED]>
 include/linux/security.h |   74 +++
 kernel/signal.c  |   17 ++
 net/core/datagram.c  |   29 --
 net/socket.c |7 +++-
 security/dummy.c |   32 ++--
 security/security.c  |   25 ++-
 6 files changed, 169 insertions(+), 15 deletions(-)

--- linux-2.6.23.orig/include/linux/security.h  2007-11-17 00:35:44.0 
+0900
+++ linux-2.6.23/include/linux/security.h   2007-11-17 00:37:26.0 
+0900
@@ -657,6 +657,25 @@ struct request_sock;
  * @sig contains the signal value.
  * @secid contains the sid of the process where the signal originated
  * Return 0 if permission is granted.
+ * @task_kill_unlocked:
+ * Check permission before sending signal @sig to the process of @pid
+ * with sys_kill.
+ * @pid contains the pid of target process.
+ * @sig contains the signal value.
+ * Return 0 if permission is granted.
+ * @task_tkill_unlocked:
+ * Check permission before sending signal @sig to the process of @pid
+ * with sys_tkill.
+ * @pid contains the pid of target process.
+ * @sig contains the signal value.
+ * Return 0 if permission is granted.
+ * @task_tgkill_unlocked:
+ * Check permission before sending signal @sig to the process of @pid
+ * with sys_tgkill.
+ * @tgid contains the thread group id.
+ * @pid contains the pid of target process.
+ * @sig contains the signal value.
+ * Return 0 if permission is granted.
  * @task_wait:
  * Check permission before allowing a process to reap a child process @p
  * and collect its status information.
@@ -778,8 +797,12 @@ struct request_sock;
  * @socket_post_accept:
  * This hook allows a security module to copy security
  * information into the newly created socket's inode.
+ * This hook also allows a security module to filter connections
+ * from unwanted peers.
+ * The connection will be aborted if this hook returns nonzero.
  * @sock contains the listening socket structure.
  * @newsock contains the newly created server socket for connection.
+ * Return 0 if permission is granted.
  * @socket_sendmsg:
  * Check permission before transmitting a message to another socket.
  * @sock contains the socket structure.
@@ -793,6 +816,12 @@ struct request_sock;
  * @size contains the size of message structure.
  * @flags contains the operational flags.
  * Return 0 if permission is granted.  
+ * @post_recv_datagram:
+ *

Re: 2.6.24-rc2-mm1 -- strange apparent network failures

2007-11-17 Thread Andrew Morgan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Kevin Winchester wrote:
> However, I got around the problem by making the code change manually -
> and my network connection is now working.  Looking at the code being
> bypassed:
> 
> if (pE.cap[i] || pP.cap[i] || pP.cap[i])
> 
> looks somewhat weird as it is testing the same condition twice.  Should
> it have been:
> 
> if (pE.cap[i] || pP.cap[i] || pI.cap[i])

Yes, that was also a bug. However, upon reflection (and as per my "0 &&"
hack), I now believe these few lines of code are problematic in general.

Thanks for reporting this bug. I'll post a more clear patch (that isn't
GPG'd).

Cheers

Andrew
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHP5vy+bHCR3gb8jsRAliTAKCvCsfZuNN7Og57S0s8O4SZNveSUwCgq4VP
vHUE/S+x09l5I24E2/rmLj4=
=JaWT
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.23.2

2007-11-17 Thread Krzysztof Halasa

Matt Mackall <[EMAIL PROTECTED]> writes:

> What is the proper encoding for a patch that contains hunks in
> multiple character sets?

8-bit binary encoding, the same for single charset patch - we don't
want mail systems to change the encoding.
Unfortunately you can't display anything like that inline I think.
That means email may be unreliable in such cases (except maybe when
both sides use UTF-8 and the patch contains only UTF-8), git/ftp/etc.
will be fine.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc2-mm1 -- strange apparent network failures

2007-11-17 Thread Kevin Winchester

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Andrew Morgan wrote:
> Kevin,
> 
> Can you try this quick hack?
> 
> diff --git a/kernel/capability.c b/kernel/capability.c
> index e57d1aa..4088610 100644
> --- a/kernel/capability.c
> +++ b/kernel/capability.c
> @@ -109,7 +109,7 @@ out:
> kdata[i].permitted = pP.cap[i];
> kdata[i].inheritable = pI.cap[i];
> }
> -   while (i < _LINUX_CAPABILITY_U32S) {
> +   while (0 && (i < _LINUX_CAPABILITY_U32S)) {
> if (pE.cap[i] || pP.cap[i] || pP.cap[i]) {
> /* Cannot represent w/ legacy structure */
> return -ERANGE;
> 


Oh, and the reason your patch turned up incorrect in my mailer and on
lkml seems to be the PGP signature.  I didn't have your public key, so
my mail client just left the full PGP-signed text in, which includes
escaping of '-' characters.  LKML must also ignore the signature.  Once
I added your public key, the patch shows up correctly in my client at least.

(I guess everyone else probably knew this already...but at least I
learned something new today)

- --
Kevin Winchester
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHP5QdKPGFQbiQ3tQRAqimAJwOSGDSM2wXeLbm+sBKehGf/haNpACfX7Cb
IALnPxwlgShR6Xb+XQclBro=
=xFUp
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc2-mm1 -- strange apparent network failures

2007-11-17 Thread Kevin Winchester

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Kevin Winchester wrote:
> Looking at the code being bypassed:
> 
> if (pE.cap[i] || pP.cap[i] || pP.cap[i])
> 
> looks somewhat weird as it is testing the same condition twice.  Should
> it have been:
> 
> if (pE.cap[i] || pP.cap[i] || pI.cap[i])
> 
> ?
> 
> I'm about to test that change instead of bypassing the loop, so I'll let
> you know the results.
> 

No, this still results in a dead network connection, although it is
probably a correct change.  I suppose giving the loop even more reasons
to return -ERANGE wasn't going to be helpful.

- --
Kevin Winchester

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHP5KXKPGFQbiQ3tQRAilbAJ9h3qtO9sb9+ctVU0pxzCBjysy06QCdE1Wd
M5V3+0BWyn04p0UeUq/KSlw=
=663t
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] Printk kernel version in WARN_ON

2007-11-17 Thread Ingo Molnar


* Andrew Morton <[EMAIL PROTECTED]> wrote:

> Should be done for all architectures, methinks.
> 
> If so, an appropriate way to do that would be to do 
> s/dump_stack/arch_dump_stack/ and do a single all-arch implementation 
> of dump_stack().  (Where we might add new goodies in the future).

i agree we can clean this up - but this is a single-line thing that is 
very useful for QA so i think utility warrants .24 inclusion. The oops 
printouts are not generalized anyway.

> Problem is that this will add a new an pointless entry to all the 
> stack dumps, unless the arch_dump_stack() implementation is smart 
> enough to skip the innermost frame.

x86 can skip stackframes via stacktrace.c.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] Printk kernel version in WARN_ON

2007-11-17 Thread Andrew Morton

On Sun, 18 Nov 2007 01:42:18 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote:

> 
> * Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> 
> > ok so how about putting the same into dump_stack() instead? (see 
> > below) added bonus is that it's now present for all dumps that use 
> > dump_stack(), not just WARN_ON() (the format I copied from the exact 
> > line used by oopses)
> 
> nice! I did things like this in -rt because it really helps to know 
> which process does a WARN_ON() or raw dump_stack().
> 
> > Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>
> 
> Acked-by: Ingo Molnar <[EMAIL PROTECTED]>
> 
> unless objections we'll put this into the x86 git tree.
> 

Should be done for all architectures, methinks.

If so, an appropriate way to do that would be to do
s/dump_stack/arch_dump_stack/ and do a single all-arch implementation of
dump_stack().  (Where we might add new goodies in the future).

Problem is that this will add a new an pointless entry to all the stack
dumps, unless the arch_dump_stack() implementation is smart enough to skip the
innermost frame.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8

2007-11-17 Thread Ingo Molnar

* Greg KH <[EMAIL PROTECTED]> wrote:

> Great, thanks for tracking this down.
> 
> Ingo, this corrisponds to changeset 
> a115d5caca1a2905ba7a32b408a6042b20179aaa in mainline.  Is that patch 
> incorrect?  Should this patch in the -stable tree be reverted?

hm, there are no such problems in .24 and the cpu_clock() and other 
fixes i did were not picked up. Find the missing fixes below. They 
should work just fine in .23 as it has the cpu_clock() functionality 
too.

[ NOTE: the most robust thing is to make the .23 version match the .24
  version of kernel/softlockup.c, so i included two other harmless
  changes in this diff as well. ]

Ingo

--->
commit a5f2ce3c6024a5bb895647b6bd88ecae5001020a
Author: Ingo Molnar <[EMAIL PROTECTED]>
Date:   Tue Oct 16 23:26:08 2007 -0700

softlockup watchdog: style cleanups

kernel/softirq.c grew a few style uncleanlinesses in the past few
months, clean that up. No functional changes:

   textdata bss dec hex filename
   1126  76   41206 4b6 softlockup.o.before
   1129  76   41209 4b9 softlockup.o.after

( the 3 bytes .text increase is due to the "<1>" appended to one of
  the printk messages. )

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>

commit 43581a10075492445f65234384210492ff333eba
Author: Ingo Molnar <[EMAIL PROTECTED]>
Date:   Tue Oct 16 23:26:08 2007 -0700

softlockup: improve debug output

Improve the debuggability of kernel lockups by enhancing the debug
output of the softlockup detector: print the task that causes the lockup
and try to print a more intelligent backtrace.

The old format was:

  BUG: soft lockup detected on CPU#1!
   [] show_trace_log_lvl+0x19/0x2e
   [] show_trace+0x12/0x14
   [] dump_stack+0x14/0x16
   [] softlockup_tick+0xbe/0xd0
   [] run_local_timers+0x12/0x14
   [] update_process_times+0x3e/0x63
   [] tick_sched_timer+0x7c/0xc0
   [] hrtimer_interrupt+0x135/0x1ba
   [] smp_apic_timer_interrupt+0x6e/0x80
   [] apic_timer_interrupt+0x33/0x38
   [] syscall_call+0x7/0xb
   ===

The new format is:

  BUG: soft lockup detected on CPU#1! [prctl:2363]

  Pid: 2363, comm:prctl
  EIP: 0060:[] CPU: 1
  EIP is at sys_prctl+0x24/0x18c
   EFLAGS: 0213Not tainted  (2.6.22-cfs-v20 #26)
  EAX: 0001 EBX: 03e7 ECX: 0001 EDX: f6df
  ESI: 03e7 EDI: 03e7 EBP: f6df0fb0 DS: 007b ES: 007b FS: 00d8
  CR0: 8005003b CR2: 4d8c3340 CR3: 3731d000 CR4: 06d0
   [] show_trace_log_lvl+0x19/0x2e
   [] show_trace+0x12/0x14
   [] show_regs+0x1ab/0x1b3
   [] softlockup_tick+0xef/0x108
   [] run_local_timers+0x12/0x14
   [] update_process_times+0x3e/0x63
   [] tick_sched_timer+0x7c/0xc0
   [] hrtimer_interrupt+0x135/0x1ba
   [] smp_apic_timer_interrupt+0x6e/0x80
   [] apic_timer_interrupt+0x33/0x38
   [] syscall_call+0x7/0xb
   ===

Note that in the old format we only knew that some system call locked
up, we didnt know _which_. With the new format we know that it's at a
specific place in sys_prctl(). [which was where i created an artificial
kernel lockup to test the new format.]

This is also useful if the lockup happens in user-space - the user-space
EIP (and other registers) will be printed too. (such a lockup would
either suggest that the task was running at SCHED_FIFO:99 and looping
for more than 10 seconds, or that the softlockup detector has a
false-positive.)

The task name is printed too first, just in case we dont manage to print
a useful backtrace.

[EMAIL PROTECTED]: fix warning]
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index e423b3a..11df812 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -15,13 +15,16 @@
 #include 
 #include 

+#include 
+
 static DEFINE_SPINLOCK(print_lock);

 static DEFINE_PER_CPU(unsigned long, touch_timestamp);
 static DEFINE_PER_CPU(unsigned long, print_timestamp);
 static DEFINE_PER_CPU(struct task_struct *, watchdog_task);

-static int did_panic = 0;
+static int did_panic;
+int softlockup_thresh = 10;

 static int
 softlock_panic(struct notifier_block *this, unsigned long event, void *ptr)
@@ -72,6 +75,7 @@ void softlockup_tick(void)
int this_cpu = smp_processor_id();
unsigned long touch_timestamp = per_cpu(touch_timestamp, this_cpu);
unsigned long print_timestamp;
+   struct pt_regs *regs = get_irq_regs();
unsigned

Re: 2.6.24-rc2-mm1 -- strange apparent network failures

2007-11-17 Thread Kevin Winchester

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Andrew Morgan wrote:
> Kevin,
> 
> Can you try this quick hack?
> 
> diff --git a/kernel/capability.c b/kernel/capability.c
> index e57d1aa..4088610 100644
> --- a/kernel/capability.c
> +++ b/kernel/capability.c
> @@ -109,7 +109,7 @@ out:
> kdata[i].permitted = pP.cap[i];
> kdata[i].inheritable = pI.cap[i];
> }
> -   while (i < _LINUX_CAPABILITY_U32S) {
> +   while (0 && (i < _LINUX_CAPABILITY_U32S)) {
> if (pE.cap[i] || pP.cap[i] || pP.cap[i]) {
> /* Cannot represent w/ legacy structure */
> return -ERANGE;
> 

Well, something went wrong with the patch - it has extra negative signs
in my mail reader, and on lkml, but now that I've hit reply and it's
been quoted, it looks fine in my mail client.  So I have no idea what
went on.

However, I got around the problem by making the code change manually -
and my network connection is now working.  Looking at the code being
bypassed:

if (pE.cap[i] || pP.cap[i] || pP.cap[i])

looks somewhat weird as it is testing the same condition twice.  Should
it have been:

if (pE.cap[i] || pP.cap[i] || pI.cap[i])

?

I'm about to test that change instead of bypassing the loop, so I'll let
you know the results.

- --
Kevin Winchester

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHP4xGKPGFQbiQ3tQRAooWAJ9c6exhOiD4VUZ04hS9z77/RmERUACfauTE
BV/JAexzlm2zSmG4laYi+HQ=
=IPkA
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] Printk kernel version in WARN_ON

2007-11-17 Thread Ingo Molnar


* Arjan van de Ven <[EMAIL PROTECTED]> wrote:

> ok so how about putting the same into dump_stack() instead? (see 
> below) added bonus is that it's now present for all dumps that use 
> dump_stack(), not just WARN_ON() (the format I copied from the exact 
> line used by oopses)

nice! I did things like this in -rt because it really helps to know 
which process does a WARN_ON() or raw dump_stack().

> Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>

Acked-by: Ingo Molnar <[EMAIL PROTECTED]>

unless objections we'll put this into the x86 git tree.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: perfmon2 merge news

2007-11-17 Thread David Miller

From: "Patrick DEMICHEL" <[EMAIL PROTECTED]>
Date: Sat, 17 Nov 2007 18:19:25 +0100

> Yet another noisy linux HPC user

Nobody on this list is interested in discussing this.

Really, the on-topic discussion here is the code and
the technical issues.  And we will work on those to
get perfmon2 into shape in due time.

I guarentee you that %99 of the kernel developers didn't
wade through your description at all, myself included.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8

2007-11-17 Thread Jeremy Fitzhardinge

Greg KH wrote:
> Great, thanks for tracking this down.
>
> Ingo, this corrisponds to changeset
> a115d5caca1a2905ba7a32b408a6042b20179aaa in mainline.  Is that patch
> incorrect?  Should this patch in the -stable tree be reverted?
>   

Hm, I've never observed a problem with this in mainline. 

Ah.  The significant difference between 2.6.23 and -git is that the
former used sched_clock as the softlockup timebase, versus cpu_clock in
git.  If sched_clock() is tsc-based, and the tsc isn't stable when using
cpufreq, then the softlockup with get confused and fire spuriously. 
Ingo's fix to reporting exposed the fact that softlockup is terminally
broken in that kernel.

I think the best course for now is to revert it, since softlockup is
hardly a critical feature.  The proper fixes would either be to backport
cpu_clock() to 2.6.23, or make it go back to using ticks.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Fix PCIe double initialization bug

2007-11-17 Thread Mark Lord


pciehp_fix_double_init_bug.patch:

Earlier patches to split out the hardware init for PCIe hotplug
resulted in some one-time initializations being redone on every
resume cycle.  Eg. irq/polling initialization.

This patch splits the hardware init into two parts,
and separates the one-time initializations from those
so that they only ever get done once, as intended.

Signed-off-by: Mark Lord <[EMAIL PROTECTED]>
---

This patch is for -mm and for Kristen's queue.  Not for 2.6.24.

drivers/pci/hotplug/pciehp.h  |2
drivers/pci/hotplug/pciehp_core.c |2
drivers/pci/hotplug/pciehp_hpc.c  |  119 +++-
3 files changed, 69 insertions(+), 54 deletions(-)

--- linux/drivers/pci/hotplug/pciehp.h.orig 2007-11-13 23:57:09.0 
-0500
+++ linux/drivers/pci/hotplug/pciehp.h  2007-11-17 19:10:01.0 -0500
@@ -163,7 +163,7 @@
int pcie_init(struct controller *ctrl, struct pcie_device *dev);
int pciehp_enable_slot(struct slot *p_slot);
int pciehp_disable_slot(struct slot *p_slot);
-int pcie_init_hardware(struct controller *ctrl, struct pcie_device *dev);
+int pcie_init_hardware_part2(struct controller *ctrl, struct pcie_device *dev);

static inline struct slot *pciehp_find_slot(struct controller *ctrl, u8 device)
{
--- linux/drivers/pci/hotplug/pciehp_core.c.orig2007-11-13 
23:57:09.0 -0500
+++ linux/drivers/pci/hotplug/pciehp_core.c 2007-11-17 19:09:43.0 
-0500
@@ -521,7 +521,7 @@
u8 status;

/* reinitialize the chipset's event detection logic */
-   pcie_init_hardware(ctrl, dev);
+   pcie_init_hardware_part2(ctrl, dev);

t_slot = pciehp_find_slot(ctrl, ctrl->slot_device_offset);

--- linux/drivers/pci/hotplug/pciehp_hpc.c.orig 2007-11-13 23:57:09.0 
-0500
+++ linux/drivers/pci/hotplug/pciehp_hpc.c  2007-11-17 19:13:49.0 
-0500
@@ -1067,28 +1067,25 @@
}
#endif

-int pcie_init_hardware(struct controller *ctrl, struct pcie_device *dev)
+static int pcie_init_hardware_part1(struct controller *ctrl,
+   struct pcie_device *dev)
{
int rc;
u16 temp_word;
-   u16 intr_enable = 0;
u32 slot_cap;
u16 slot_status;
-   struct pci_dev *pdev;
-
-   pdev = dev->port;

rc = pciehp_readl(ctrl, SLOTCAP, _cap);
if (rc) {
err("%s: Cannot read SLOTCAP register\n", __FUNCTION__);
-   goto abort_free_ctlr;
+   return -1;
}

/* Mask Hot-plug Interrupt Enable */
rc = pciehp_readw(ctrl, SLOTCTRL, _word);
if (rc) {
err("%s: Cannot read SLOTCTRL register\n", __FUNCTION__);
-   goto abort_free_ctlr;
+   return -1;
}

dbg("%s: SLOTCTRL %x value read %x\n",
@@ -1099,62 +1096,46 @@
rc = pciehp_writew(ctrl, SLOTCTRL, temp_word);
if (rc) {
err("%s: Cannot write to SLOTCTRL register\n", __FUNCTION__);
-   goto abort_free_ctlr;
+   return -1;
}

rc = pciehp_readw(ctrl, SLOTSTATUS, _status);
if (rc) {
err("%s: Cannot read SLOTSTATUS register\n", __FUNCTION__);
-   goto abort_free_ctlr;
+   return -1;
}

temp_word = 0x1F; /* Clear all events */
rc = pciehp_writew(ctrl, SLOTSTATUS, temp_word);
if (rc) {
err("%s: Cannot write to SLOTSTATUS register\n", __FUNCTION__);
-   goto abort_free_ctlr;
+   return -1;
}
+   return 0;
+}

-   if (pciehp_poll_mode) {
-   /* Install interrupt polling timer. Start with 10 sec delay */
-   init_timer(>poll_timer);
-   start_int_poll_timer(ctrl, 10);
-   } else {
-   /* Installs the interrupt handler */
-   rc = request_irq(ctrl->pci_dev->irq, pcie_isr, IRQF_SHARED,
-MY_NAME, (void *)ctrl);
-   dbg("%s: request_irq %d for hpc%d (returns %d)\n",
-   __FUNCTION__, ctrl->pci_dev->irq,
-   atomic_read(_num_controllers), rc);
-   if (rc) {
-   err("Can't get irq %d for the hotplug controller\n",
-   ctrl->pci_dev->irq);
-   goto abort_free_ctlr;
-   }
-   }
-   dbg("pciehp ctrl b:d:f:irq=0x%x:%x:%x:%x\n", pdev->bus->number,
-   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn), dev->irq);
-
-   /*
-* If this is the first controller to be initialized,
-* initialize the pciehp work queue
-*/
-   if (atomic_add_return(1, _num_controllers) == 1) {
-   pciehp_wq = create_singlethread_workqueue("pciehpd");
-   if (!pciehp_wq) {
-   rc = -ENOMEM;
-   goto abort_free_irq;
-   }
-   }
+int

Re: [NET]: rt_check_expire() can take a long time, add a cond_resched()

2007-11-17 Thread David Miller

From: Andi Kleen <[EMAIL PROTECTED]>
Date: Sat, 17 Nov 2007 13:56:08 +0100

> Arjan van de Ven <[EMAIL PROTECTED]> writes:
> >> > 
> >> > Its not that cheap. The ChangeLog included my own numbers, on a
> >> > Pentium M machine. (i686, 1.6 GHz, 1.5 GB ram)
> >> > 
> >> > Without "if (need_resched())" (so calling need_resched() X.XXX.XXX 
> >> > times), each run takes 88ms
> >> > 
> >> > With the extra check (and *much* less function calls), each run
> >> > takes 25ms
> 
> ms?!? The numbers sound wrong. Wrong unit? 

Read what Eric is saying.  He is saying "any entire run" purging
the routing cache takes that long, not just one call.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc2-mm1 -- strange apparent network failures

2007-11-17 Thread Andrew Morgan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Kevin,

Can you try this quick hack?

diff --git a/kernel/capability.c b/kernel/capability.c
index e57d1aa..4088610 100644
- --- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -109,7 +109,7 @@ out:
kdata[i].permitted = pP.cap[i];
kdata[i].inheritable = pI.cap[i];
}
- -   while (i < _LINUX_CAPABILITY_U32S) {
+   while (0 && (i < _LINUX_CAPABILITY_U32S)) {
if (pE.cap[i] || pP.cap[i] || pP.cap[i]) {
/* Cannot represent w/ legacy structure */
return -ERANGE;

Thanks

Andrew

Kevin Winchester wrote:
> On November 17, 2007 01:16:58 am Andrew Morgan wrote:
>> Hi,
>>
>> This warning is just saying that you might want to reconsider
>> recompiling your dhclient with a newer libcap - which has native support
>> for 64-bit capabilities. This is supposed to be informative, and not be
>> associated with any particular error.
>>
>> From your comments, you believe that this patch causes something in your
>> boot process to fail. Can you supply some detail about the version of
>> dhclient you are using? I'd like to understand exactly what it is doing
>> (via libcap).
>>
>> Thanks
>>
> 
> The boot succeeds (and appears to bring initialize the network adapter 
> properly - it autonegotiates a 100Mbps link speed), but the dhcp client is 
> never able to get an address.  However, applying the rc2-mm1 patch series up 
> to just before:
> 
>   add-64-bit-capability-support-to-the-kernel.patch
> 
> results in a working kernel.  Applying just this patch causes the failure.  
> To 
> be sure, I also tried applying the above patch plus the following ones:
> 
>   add-64-bit-capability-support-to-the-kernel-checkpatch-fixes.patch
>   add-64-bit-capability-support-to-the-kernel-fix.patch
>   add-64-bit-capability-support-to-the-kernel-fix-fix.patch
>   remove-unnecessary-include-from-include-linux-capabilityh.patch
> 
> but the problem still occurs even with all of these.
> 
> As to versions, I'm running Kubuntu gutsy, so I have the default:
> 
> dhcp3-client   3.0.5-3ubuntu4
> libcap11:1.10-14build1
> 
> packages installed.
> 
> Let me know if you need any other information, or if you have a patch you 
> would like tested.
> 
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFHP37LQheEq9QabfIRAst5AJ9Nsw0RtF2NDuUAMvQZh5OFWEB4ugCeIxMH
lp5/Ka7SJZLIrQpZDijrd1E=
=GN18
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [TOMOYO #5 18/18] LSM expansion for TOMOYO Linux.

2007-11-17 Thread Paul Moore

On Friday 16 November 2007 10:45:32 pm Tetsuo Handa wrote:
> Paul Moore wrote:
> > I might be missing something here, but why do you need to do a skb_peek()
> > again?  You already have the skb and the sock, just do the unlink.
>
> The skb might be already dequeued by other thread while I slept inside
> security_post_recv_datagram().

Okay, well if that is the case I think you are going to have another problem 
in that you could end up throwing away skbs that haven't been through your 
security_post_recv_datagram() hook because you _always_ throw away the result 
of the second skb_peek().  Once again, if I'm wrong please correct me.

> > Second, why not move the 'no_peek' code to just before 'no_packet'?
>
> Oh, I didn't notice I can insert here. Now I can also move the rest code
> like
>
> | error = security_post_recv_datagram(sk, skb, flags);
> | if (error)
> |   goto force_dequeue;
> |
> | } while (!wait_for_packet(sk, err, ));

Where did the 'if (skb) return skb;' code go?  Don't you need to do you LSM 
call before you return the skb?

-- 
paul moore
linux security @ hp
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-17 Thread Torsten Kaiser

On Nov 18, 2007 12:05 AM, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
>
> On Sat, Nov 17, 2007 at 08:40:22PM +0100, Torsten Kaiser wrote:
>
> > Lockdep triggers immedetly before the freeze, but the result is still
> > not helpful:
> >
> > [  221.565011] INFO: trying to register non-static key.
> > [  221.566999] the code is fine but needs lockdep annotation.
> > [  221.569206] turning off the locking correctness validator.
> > [  221.571404]
> > [  221.571405] Call Trace:
> > [  221.572996]  [] __lock_acquire+0x4c4/0x1140
> > [  221.575298]  [] lock_acquire+0x55/0x70
> > [  221.577429]  [] __wake_up+0x2d/0x70
> > [  221.579457]  [] _spin_lock_irqsave+0x34/0x50
> > [  221.581800]  [] _spin_unlock_irqrestore+0x55/0x70
> > [  221.584317]  [] __wake_up+0x2d/0x70
> > [  221.586344]  [] rpc_async_schedule+0x0/0x10
> > [  221.588648]  [] nfs_free_unlinkdata+0x1e/0x50
> > [  221.591023]  [] rpc_release_calldata+0x26/0x50
> > [  221.593428]  [] run_workqueue+0x16f/0x210
> > [  221.595662]  [] trace_hardirqs_on+0xc1/0x160
> > [  221.598004]  [] worker_thread+0x0/0xb0
> > [  221.600130]  [] worker_thread+0x0/0xb0
> > [  221.602265]  [] worker_thread+0x6d/0xb0
> > [  221.604431]  [] autoremove_wake_function+0x0/0x30
> > [  221.606939]  [] worker_thread+0x0/0xb0
> > [  221.609067]  [] worker_thread+0x0/0xb0
> > [  221.611199]  [] kthread+0x4b/0x80
> > [  221.613156]  [] child_rip+0xa/0x12
> > [  221.615151]  [] restore_args+0x0/0x30
> > [  221.617247]  [] kthread+0x0/0x80
> > [  221.619162]  [] child_rip+0x0/0x12
> > [  221.621147]
> > [  221.621749] INFO: lockdep is turned off.
>
> I've been staring at this NFS code for a while an can't make any sense
> out of it. It seems to correctly initialize the waitqueue. So this would
> indicate corruption of some sort.

Not sure if this is helpful, but after looking into the code, the
above stacktrace looks somewhat damaged.
Might be my fault: # CONFIG_FRAME_POINTER is not set
On the other hand the stacktrace from the run with the SLUB lockdep
fix shows the same function names.

That trace contains this line:
 [] nfs_free_unlinkdata+0x1e/0x50
(gdb) list *0x8030167e
0x8030167e is in nfs_free_unlinkdata (fs/nfs/unlink.c:33).
28   */
29  static void
30  nfs_free_unlinkdata(struct nfs_unlinkdata *data)
31  {
32  nfs_sb_deactive(NFS_SERVER(data->dir));
33  iput(data->dir);
34  put_rpccred(data->cred);
35  kfree(data->args.name.name);
36  kfree(data);
37  }

Is some inode lock guilty?
Please ask, if you need more information.

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] lockdep: annotate do_debug() trap handler

2007-11-17 Thread Peter Zijlstra

Subject: lockdep: annotate do_debug() trap handler

Ensure the hardirq state is consistent before using locks. Use the rare
trace_hardirqs_fixup() because the trap can happen in any context.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 arch/x86/kernel/traps_32.c |2 ++
 arch/x86/kernel/traps_64.c |2 ++
 2 files changed, 4 insertions(+)

Index: linux-2.6/arch/x86/kernel/traps_32.c
===
--- linux-2.6.orig/arch/x86/kernel/traps_32.c
+++ linux-2.6/arch/x86/kernel/traps_32.c
@@ -830,6 +830,8 @@ fastcall void __kprobes do_debug(struct 
unsigned int condition;
struct task_struct *tsk = current;
 
+   trace_hardirqs_fixup();
+
get_debugreg(condition, 6);
 
if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
Index: linux-2.6/arch/x86/kernel/traps_64.c
===
--- linux-2.6.orig/arch/x86/kernel/traps_64.c
+++ linux-2.6/arch/x86/kernel/traps_64.c
@@ -840,6 +840,8 @@ asmlinkage void __kprobes do_debug(struc
struct task_struct *tsk = current;
siginfo_t info;
 
+   trace_hardirqs_fixup();
+
get_debugreg(condition, 6);
 
if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-17 Thread Peter Zijlstra

On Sat, Nov 17, 2007 at 08:40:22PM +0100, Torsten Kaiser wrote:

> Lockdep triggers immedetly before the freeze, but the result is still
> not helpful:
> 
> [  221.565011] INFO: trying to register non-static key.
> [  221.566999] the code is fine but needs lockdep annotation.
> [  221.569206] turning off the locking correctness validator.
> [  221.571404]
> [  221.571405] Call Trace:
> [  221.572996]  [] __lock_acquire+0x4c4/0x1140
> [  221.575298]  [] lock_acquire+0x55/0x70
> [  221.577429]  [] __wake_up+0x2d/0x70
> [  221.579457]  [] _spin_lock_irqsave+0x34/0x50
> [  221.581800]  [] _spin_unlock_irqrestore+0x55/0x70
> [  221.584317]  [] __wake_up+0x2d/0x70
> [  221.586344]  [] rpc_async_schedule+0x0/0x10
> [  221.588648]  [] nfs_free_unlinkdata+0x1e/0x50
> [  221.591023]  [] rpc_release_calldata+0x26/0x50
> [  221.593428]  [] run_workqueue+0x16f/0x210
> [  221.595662]  [] trace_hardirqs_on+0xc1/0x160
> [  221.598004]  [] worker_thread+0x0/0xb0
> [  221.600130]  [] worker_thread+0x0/0xb0
> [  221.602265]  [] worker_thread+0x6d/0xb0
> [  221.604431]  [] autoremove_wake_function+0x0/0x30
> [  221.606939]  [] worker_thread+0x0/0xb0
> [  221.609067]  [] worker_thread+0x0/0xb0
> [  221.611199]  [] kthread+0x4b/0x80
> [  221.613156]  [] child_rip+0xa/0x12
> [  221.615151]  [] restore_args+0x0/0x30
> [  221.617247]  [] kthread+0x0/0x80
> [  221.619162]  [] child_rip+0x0/0x12
> [  221.621147]
> [  221.621749] INFO: lockdep is turned off.

I've been staring at this NFS code for a while an can't make any sense
out of it. It seems to correctly initialize the waitqueue. So this would
indicate corruption of some sort.



> I also had another BUG output during system startup, but that should
> be unrelated:
> [  103.254681] BUG: sleeping function called from invalid context at
> kernel/rwsem.c:20
> [  103.257757] in_atomic():0, irqs_disabled():1
> [  103.259469] 1 lock held by artsd/5883:
> [  103.259470]  #0:  (pm_qos_lock){}, at: []
> pm_qos_add_requirement+0x6b/0xf0
> [  103.263316] irq event stamp: 49712
> [  103.263318] hardirqs last  enabled at (49711): []
> __kmalloc+0x10d/0x180
> [  103.263321] hardirqs last disabled at (49712): []
> _spin_lock_irqsave+0x1a/0x50
> [  103.263326] softirqs last  enabled at (48820): []
> unix_release_sock+0x79/0x240
> [  103.263330] softirqs last disabled at (48818): []
> _write_lock_bh+0x9/0x30
> [  103.26]
> [  103.26] Call Trace:
> [  103.263335]  [] down_read+0x15/0x40
> [  103.263338]  [] __blocking_notifier_call_chain+0x46/0x90
> [  103.263341]  [] pm_qos_add_requirement+0x93/0xf0
> [  103.263344]  [] snd_pcm_hw_params+0x2fa/0x380
> [  103.263347]  [] snd_pcm_common_ioctl1+0xb4c/0xdc0
> [  103.263350]  [] __do_fault+0x227/0x470
> [  103.263353]  [] __lock_acquire+0x745/0x1140
> [  103.263357]  [] _spin_unlock_irqrestore+0x55/0x70
> [  103.263359]  [] trace_hardirqs_on+0xc1/0x160
> [  103.263362]  [] snd_pcm_playback_ioctl1+0x48/0x240
> [  103.263365]  [] snd_pcm_playback_ioctl+0x36/0x50
> [  103.263367]  [] vfs_ioctl+0x2f/0xa0
> [  103.263369]  [] do_vfs_ioctl+0x260/0x2e0
> [  103.263371]  [] trace_hardirqs_on+0xc1/0x160
> [  103.263373]  [] sys_ioctl+0x91/0xb0
> [  103.263376]  [] system_call+0x7e/0x83
> [  103.263379]

This pm-qos code is fubar, it calls blocking_notifier_call_chain while
holding a spinlock (and that is after 'fixing' it from a
srcu_notifier_call_chain - which is equally wrong).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] Printk kernel version in WARN_ON

2007-11-17 Thread Denys Vlasenko

On Saturday 17 November 2007 10:15, Arjan van de Ven wrote:
> Hi,
>
>  #define WARN_ON(condition) ({
> \
>   int __ret_warn_on = !!(condition);  \
>   if (unlikely(__ret_warn_on)) {  \
> - printk("WARNING: at %s:%d %s()\n", __FILE__,\
> - __LINE__, __FUNCTION__);\
> + printk("WARNING: at %s:%d %s()  (%s)\n", __FILE__,  \
> + __LINE__, __FUNCTION__, UTS_RELEASE);   \
>   dump_stack();   \
>   }   \
>   unlikely(__ret_warn_on);\

We have ~700 WARN_ONs in the tree. Adding UTS_RELEASE to printk
grows every one of them by at least 5 bytes.

I think it makes sense to move printk out-of-line, to

void print_WARN_ON_warning(const char *file, int line, const char *func);

This will save at least 10 bytes per WARN_ON.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-17 Thread root

On Sat, Nov 17, 2007 at 07:09:46PM +0100, Ingo Molnar wrote:
> 
> * Torsten Kaiser <[EMAIL PROTECTED]> wrote:
> 
> > Sadly lockdep does not work for me, as it gets turned off early:
> > [   39.851594] -
> > [   39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> > [   39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
> > [   39.866963]  (>list_lock){-+..}, at: []
> 
> hey, that means it found a bug - which is not sad at all :-)

---
Subject: lockdep: slub: annotate boot time node->list_lock usage

inconsistent {softirq-on-W} -> {in-softirq-W} usage.
swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
 (>list_lock){-+..}, at: [] add_partial+0x31/0xa0
{softirq-on-W} state was registered at:
  [] __lock_acquire+0x3e8/0x1140
  [] debug_check_no_locks_freed+0x188/0x1a0
  [] lock_acquire+0x55/0x70
  [] add_partial+0x31/0xa0
  [] _spin_lock+0x1e/0x30
  [] add_partial+0x31/0xa0
  [] kmem_cache_open+0x1cc/0x330
  [] _spin_unlock_irq+0x24/0x30
  [] create_kmalloc_cache+0x64/0xf0
  [] init_alloc_cpu_cpu+0x70/0x90
  [] kmem_cache_init+0x65/0x1d0
  [] start_kernel+0x23e/0x350
  [] _sinittext+0x12d/0x140
  [] 0x

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
CC: Christoph Lameter <[EMAIL PROTECTED]>
CC: Kamalesh Babulal <[EMAIL PROTECTED]>
---
 mm/slub.c |8 
 1 file changed, 8 insertions(+)

Index: linux-2.6/mm/slub.c
===
--- linux-2.6.orig/mm/slub.c
+++ linux-2.6/mm/slub.c
@@ -2155,6 +2155,7 @@ static struct kmem_cache_node *early_kme
 {
struct page *page;
struct kmem_cache_node *n;
+   unsigned long flags;
 
BUG_ON(kmalloc_caches->size < sizeof(struct kmem_cache_node));
 
@@ -2179,7 +2180,14 @@ static struct kmem_cache_node *early_kme
 #endif
init_kmem_cache_node(n);
atomic_long_inc(>nr_slabs);
+   /*
+* lockdep requires consistent irq usage for each lock
+* so even though there cannot be a race this early in
+* the boot sequence, we still disable irqs.
+*/
+   local_irq_save(flags);
add_partial(kmalloc_caches, page, 0);
+   local_irq_restore(flags);
return n;
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

2007-11-17 Thread Remy Bohmer

> > Sure, you want to split the list?
>
> split the list with you? Feel free to take any of those :-) dev->sem is
> nontrivial and probably not possible right now - and some of the others
> might be problematic too. But there might be fixable ones in the list.
> This shouldnt become like the BKL conversion - never truly finished.

Hey Ingo and Daniel, Leave some of the fun open for me :-)

I just looked at the list and I found a few that seem doable.


Remy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] net/ipv4/arp.c: Fix arp reply when sender ip 0

2007-11-17 Thread Jarek Poplawski

Bill Fink wrote, On 11/16/2007 08:26 PM:
...

> Regarding the Target IP, RFC 826 says:
> 
>   "The target protocol address is necessary in the request form
>   of the packet so that a machine can determine whether or not
>   to enter the sender information in a table or to send a reply.
>   It is not necessarily needed in the reply form if one assumes
>   a reply is only provoked by a request.  It is included for
>   completeness, network monitoring, and to simplify the suggested
>   processing algorithm described above (which does not look at
>   the opcode until AFTER putting the sender information in a
>   table).
> 
> So it's ambiguous about the target IP address in an ARP reply packet,
> but a value of 0.0.0.0 makes more logical sense to me than using
> 192.168.0.1 in this example case, since it should reflect the requestor
> IP address, which is unknown in this case.

IMHO, you are mostly right, but, according to this, if it's ambiguous
then only, if there is the target IP or no target IP, so here 0.0.0.0
or 0.0.0.0...

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-17 Thread Alan Stern

On Sat, 17 Nov 2007, Rafael J. Wysocki wrote:

> On Saturday, 17 of November 2007, Jiri Slaby wrote:
> > On 11/16/2007 05:10 PM, Alan Stern wrote:
> > > On Thu, 15 Nov 2007, Greg KH wrote:
> > > 
> > >>> The offending -mm patch is
> > >>> gregkh-driver-pm-acquire-device-locks-prior-to-suspending.patch
> > >>>
> > >>> 2.6.24-rc2-mm1 minus it works just fine; PROVE_LOCKING shows nothing 
> > >>> new when
> > >>> the patch is applied.
> > >> Thanks for tracking this down.  Alan, any thoughts?
> > > 
> > > It's a driver problem somewhere.  Probably not one of the most common 
> > > drivers because I don't see the same problem here (but then I'm not 
> > > testing -mm).
> > > 
> > > The thing to do is figure out which driver is causing the problem.
> > > Jiri, try enabling CONFIG_DEBUG_DRIVER.  
> > 
> > Sadly no output.
> > 
> > > If there's also a config 
> > > option to prevent the console from being suspended, set it as well.  
> > 
> > no_suspend_console kernel parameter has no effect (why?).
> 
> I'm not sure.
> 
> Please try to set CONFIG_PM_VERBOSE.

I finally got 2.6.24-rc2-mm1 working.  Andrew, how come those two
recent patches from Greg and Kay (the ones changing alloc_disk_node()  
and system_bus_init()) aren't in your hot-fixes directory?

There's still a problem.  During bootup I get this:

floppy0: Floppy io-port 0x03f2 in use

That's not supposed to happen; the floppy disk should be working 
perfectly.  Is this a known problem?

Back to the main topic...  My system hibernates and resumes with no
apparent problem.  Jiri, it looks like you'll have to do some debug
tracing of the routines in drivers/base/power/main.c.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland

2007-11-17 Thread Hugh Dickins

On Tue, 13 Nov 2007, Erez Zadok wrote:
> 
> I posted all of these patches just now.  You're CC'ed.  Hopefully Andrew can
> pull from my unionfs.git branch soon.
> 
> You also reported in your previous emails some hangs/oopses while doing make
> -j 20 in unionfs on top of a single tmpfs, using -mm.  After several days,
> I've not been able to reproduce this w/ my latest set of patches.  If you
> can send me your .config and the specs on the h/w you're using (cpus, mem,
> etc.), I'll see if I can find something similar to it on my end and run the
> same tests.

I'm glad to report that this unionfs, not the one in 2.6.24-rc2-mm1
but the one including those 9 patches you posted, now gets through
my testing with tmpfs without a problem.  I do still get occasional
"unionfs: new lower inode mtime (bindex=0, name=)"
messages, but nothing worse seen yet: a big improvement.

I deceived myself for a while that the danger of shmem_writepage
hitting its BUG_ON(entry->val) was dealt with too; but that's wrong,
I must go back to working out an escape from that one (despite never
seeing it).

I did think you could clean up the doubled set_page_dirtys,
but it's of no consequence.

Hugh

--- 2.6.24-rc2-mm1+9/fs/unionfs/mmap.c  2007-11-17 12:23:30.0 +
+++ linux/fs/unionfs/mmap.c 2007-11-17 20:22:29.0 +
@@ -56,6 +56,7 @@ static int unionfs_writepage(struct page
copy_highpage(lower_page, page);
flush_dcache_page(lower_page);
SetPageUptodate(lower_page);
+   set_page_dirty(lower_page);
 
/*
 * Call lower writepage (expects locked page).  However, if we are
@@ -66,12 +67,11 @@ static int unionfs_writepage(struct page
 * success.
 */
if (wbc->for_reclaim) {
-   set_page_dirty(lower_page);
unlock_page(lower_page);
goto out_release;
}
+
BUG_ON(!lower_mapping->a_ops->writepage);
-   set_page_dirty(lower_page);
clear_page_dirty_for_io(lower_page); /* emulate VFS behavior */
err = lower_mapping->a_ops->writepage(lower_page, wbc);
if (err < 0)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] eradicate bashisms in scripts/patch-kernel

2007-11-17 Thread Andreas Mohr

Make the patch-kernel shell script sufficiently compatible with POSIX
shells,
i.e., remove bashisms from scripts/patch-kernel.
This means that it now also works on dash 0.5.3-5
and still works on bash 3.1dfsg-8.

Full changelog:
- replaced non-standard "==" by standard "="
- replaced non-standard "source" statement by POSIX "dot" command
- use leading ./ on mktemp filename to force the tempfile to a local
  directory, so that the search path is not used
- replace bash syntax to remove leading dot by similar POSIX syntax
- added missing (optional/not required) $ signs to shell variable names

Signed-off-by: Andreas Mohr <[EMAIL PROTECTED]>
---
Thanks for all comments! I might want to make sure to read more specs
next time...
Cowardly didn't dare to pre-add Randy's line, feel free to ack ;)

--- linux-2.6.23/scripts/patch-kernel.orig  2007-11-17 21:26:47.0 
+0100
+++ linux-2.6.23/scripts/patch-kernel   2007-11-17 21:27:59.0 +0100
@@ -65,7 +65,7 @@
 patchdir=${2-.}
 stopvers=${3-default}
 
-if [ "$1" == -h -o "$1" == --help -o ! -r "$sourcedir/Makefile" ]; then
+if [ "$1" = -h -o "$1" = --help -o ! -r "$sourcedir/Makefile" ]; then
 cat << USAGE
 usage: $PNAME [-h] [ sourcedir [ patchdir [ stopversion ] [ -acxx ] ] ]
   source directory defaults to /usr/src/linux,
@@ -182,10 +182,12 @@
 }
 
 # set current VERSION, PATCHLEVEL, SUBLEVEL, EXTRAVERSION
-TMPFILE=`mktemp .tmpver.XX` || { echo "cannot make temp file" ; exit 1; }
+# force $TMPFILEs below to be in local directory: a slash character prevents
+# the dot command from using the search path.
+TMPFILE=`mktemp ./.tmpver.XX` || { echo "cannot make temp file" ; exit 1; }
 grep -E "^(VERSION|PATCHLEVEL|SUBLEVEL|EXTRAVERSION)" $sourcedir/Makefile > 
$TMPFILE
 tr -d [:blank:] < $TMPFILE > $TMPFILE.1
-source $TMPFILE.1
+. $TMPFILE.1
 rm -f $TMPFILE*
 if [ -z "$VERSION" -o -z "$PATCHLEVEL" -o -z "$SUBLEVEL" ]
 then
@@ -202,11 +204,7 @@
 EXTRAVER=
 if [ x$EXTRAVERSION != "x" ]
 then
-   if [ ${EXTRAVERSION:0:1} == "." ]; then
-   EXTRAVER=${EXTRAVERSION:1}
-   else
-   EXTRAVER=$EXTRAVERSION
-   fi
+   EXTRAVER=${EXTRAVERSION#.}
EXTRAVER=${EXTRAVER%%[[:punct:]]*}
#echo "$PNAME: changing EXTRAVERSION from $EXTRAVERSION to $EXTRAVER"
 fi
@@ -251,16 +249,16 @@
 do
 CURRENTFULLVERSION="$VERSION.$PATCHLEVEL.$SUBLEVEL"
 EXTRAVER=
-if [ $stopvers == $CURRENTFULLVERSION ]; then
+if [ $stopvers = $CURRENTFULLVERSION ]; then
 echo "Stopping at $CURRENTFULLVERSION base as requested."
 break
 fi
 
-SUBLEVEL=$((SUBLEVEL + 1))
+SUBLEVEL=$(($SUBLEVEL + 1))
 FULLVERSION="$VERSION.$PATCHLEVEL.$SUBLEVEL"
 #echo "#___ trying $FULLVERSION ___"
 
-if [ $((SUBLEVEL)) -gt $((STOPSUBLEVEL)) ]; then
+if [ $(($SUBLEVEL)) -gt $(($STOPSUBLEVEL)) ]; then
echo "Stopping since sublevel ($SUBLEVEL) is beyond stop-sublevel 
($STOPSUBLEVEL)"
exit 1
 fi
@@ -297,7 +295,7 @@
 if [ x$gotac != x ]; then
   # Out great user wants the -ac patches
# They could have done -ac (get latest) or -acxx where xx=version they 
want
-   if [ $gotac == "-ac" ]; then
+   if [ $gotac = "-ac" ]; then
  # They want the latest version
HIGHESTPATCH=0
for PATCHNAMES in $patchdir/patch-${CURRENTFULLVERSION}-ac*\.*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8

2007-11-17 Thread Greg KH

On Sat, Nov 17, 2007 at 08:05:33PM +, David wrote:
> Greg KH wrote:
> > On Sat, Nov 17, 2007 at 07:21:35PM +0100, Javier Kohen wrote:
> >   
> >> I upgraded today from 2.6.23 to 2.6.23.8 and started seeing a lot of
> >> these in the logs:
> >> 
> >
> > Can you see if the problem showed up in 2.6.23.2 or .3 to help narrow
> > this down?
> >   
> This is the culprit, reverting fixes the issue.
> 
> Cheers
> David
> 
> --- a/kernel/softlockup.c
> +++ b/kernel/softlockup.c
> @@ -80,10 +80,11 @@ void softlockup_tick(void)
> print_timestamp = per_cpu(print_timestamp, this_cpu);
> 
> /* report at most once a second */
> -   if (print_timestamp < (touch_timestamp + 1) ||
> -   did_panic ||
> -   !per_cpu(watchdog_task, this_cpu))
> +   if ((print_timestamp >= touch_timestamp &&
> +   print_timestamp < (touch_timestamp + 1)) ||
> +   did_panic || !per_cpu(watchdog_task, this_cpu)) {
> return;
> +   }
> 
> /* do not print during early bootup: */
> if (unlikely(system_state != SYSTEM_RUNNING)) {
> 


Great, thanks for tracking this down.

Ingo, this corrisponds to changeset
a115d5caca1a2905ba7a32b408a6042b20179aaa in mainline.  Is that patch
incorrect?  Should this patch in the -stable tree be reverted?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] task_pid_nr_ns() breaks proc_pid_readdir()

2007-11-17 Thread Eric W. Biederman

Oleg Nesterov <[EMAIL PROTECTED]> writes:

> proc_pid_readdir:
>
>   for (...; ...; task = next_tgid(tgid + 1, ns)) {
>   tgid = task_pid_nr_ns(task, ns);
>   ... use tgid ...
>
> The first problem is that task_pid_nr_ns() can race with RCU and read the
> freed memory.
>
> However, rcu_read_lock() can't help. next_tgid() returns a pinned task_struct,
> but the task can be released (and it's pid detached) before task_pid_nr_ns()
> reads the pid_t value. In that case task_pid_nr_ns() returns 0 thus breaking
> the whole logic.
>
> Make sure that task_pid_nr_ns() returns !0 before updating tgid. Note that
> next_tgid(tgid + 1) can find the same "struct pid" again, but we shouldn't
> go into the endless loop because pid_task(PIDTYPE_PID) must return NULL in
> this case, so next_tgid() can't return the same task.
>
> Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

Oleg I think I would rather update next_tgid to return the tgid (which
removes the need to call task_pid_nr_ns).  This keeps all of the task
iteration logic together in next_tgid.

Although looking at this in more detail, I'm half wondering if
proc_pid_make_inode() should take a struct pid instead of a task.
At first glance that looks like it might be a little simple and more
race free.  Although it doesn't do any favors to:
>   inode->i_gid = 0;
>   if (task_dumpable(task)) {
>   inode->i_uid = task->euid;
>   inode->i_gid = task->egid;
>   }
>   security_task_to_inode(task, inode);

Anyway short of rewriting the world this is what updating next_tgid
looks like.  Opinions?

Eric


diff --git a/fs/proc/base.c b/fs/proc/base.c
index a17c268..5d9328d 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2411,7 +2411,7 @@ out:
  * Find the first task with tgid >= tgid
  *
  */
-static struct task_struct *next_tgid(unsigned int tgid,
+static struct task_struct *next_tgid(unsigned int *tgid,
struct pid_namespace *ns)
 {
struct task_struct *task;
@@ -2420,9 +2420,9 @@ static struct task_struct *next_tgid(unsigned int tgid,
rcu_read_lock();
 retry:
task = NULL;
-   pid = find_ge_pid(tgid, ns);
+   pid = find_ge_pid(*tgid, ns);
if (pid) {
-   tgid = pid_nr_ns(pid, ns) + 1;
+   *tgid = pid_nr_ns(pid, ns);
task = pid_task(pid, PIDTYPE_PID);
/* What we to know is if the pid we have find is the
 * pid of a thread_group_leader.  Testing for task
@@ -2436,8 +2436,10 @@ retry:
 * found doesn't happen to be a thread group leader.
 * As we don't care in the case of readdir.
 */
-   if (!task || !has_group_leader_pid(task))
+   if (!task || !has_group_leader_pid(task)) {
+   *tgid += 1;
goto retry;
+   }
get_task_struct(task);
}
rcu_read_unlock();
@@ -2475,10 +2477,9 @@ int proc_pid_readdir(struct file * filp, void * dirent, 
filldir_t filldir)
 
ns = filp->f_dentry->d_sb->s_fs_info;
tgid = filp->f_pos - TGID_OFFSET;
-   for (task = next_tgid(tgid, ns);
+   for (task = next_tgid(, ns);
 task;
-put_task_struct(task), task = next_tgid(tgid + 1, ns)) {
-   tgid = task_pid_nr_ns(task, ns);
+put_task_struct(task), tgid += 1, task = next_tgid(, ns)) {
filp->f_pos = tgid + TGID_OFFSET;
if (proc_pid_fill_cache(filp, dirent, filldir, task, tgid) < 0) 
{
put_task_struct(task);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-17 Thread Rafael J. Wysocki

On Saturday, 17 of November 2007, Jiri Slaby wrote:
> On 11/16/2007 05:10 PM, Alan Stern wrote:
> > On Thu, 15 Nov 2007, Greg KH wrote:
> > 
> >>> The offending -mm patch is
> >>> gregkh-driver-pm-acquire-device-locks-prior-to-suspending.patch
> >>>
> >>> 2.6.24-rc2-mm1 minus it works just fine; PROVE_LOCKING shows nothing new 
> >>> when
> >>> the patch is applied.
> >> Thanks for tracking this down.  Alan, any thoughts?
> > 
> > It's a driver problem somewhere.  Probably not one of the most common 
> > drivers because I don't see the same problem here (but then I'm not 
> > testing -mm).
> > 
> > The thing to do is figure out which driver is causing the problem.
> > Jiri, try enabling CONFIG_DEBUG_DRIVER.  
> 
> Sadly no output.
> 
> > If there's also a config 
> > option to prevent the console from being suspended, set it as well.  
> 
> no_suspend_console kernel parameter has no effect (why?).

I'm not sure.

Please try to set CONFIG_PM_VERBOSE.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] do_task_stat: don't use task_pid_nr_ns() lockless

2007-11-17 Thread Eric W. Biederman

Oleg Nesterov <[EMAIL PROTECTED]> writes:

> Without rcu/tasklist/siglock lock task_pid_nr_ns() may read the freed memory,
> move the callsite under ->siglock.
>
> Sadly, we can report pid == 0 if the task was detached.

We only get detached in release_task so it is a pretty small window
where we can return pid == 0.  Usually get_task_pid will fail first
and we will return -ESRCH.  Still the distance from open to 

There is another bug in here as well.  current->nsproxy->pid_ns is wrong.
What we want is: ns = dentry->d_sb->s_fs_info;

Otherwise we will have file descriptor passing races and the like.

We could also do: proc_pid(inode) to get the pid, which is a little
more race free, and will prevent us from returning pid == 0.

In either event it looks like we need to implement some proper
file operations for these proc files, maybe even going to seq file
status.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-17 Thread Torsten Kaiser

On Nov 17, 2007 8:33 PM, Christoph Lameter <[EMAIL PROTECTED]> wrote:
> On Sat, 17 Nov 2007, Andrew Morton wrote:
>
> > That's slub.  It appears that list_lock is being taken from process context
> > in one place and from softirq in another.
>
> I kicked out some weird interrupt disable code in mm that was only run during
> NUMA bootstrap.

I'm using NUMA (Opteron), so this indeed fixes it.

A kernel complied with SLUB now outputs the same message as the SLAB
one, that lockdep annotations are needed at the place where nfs hangs.

> This should fix it but isnt there some mechanism to convince lockdep that
> it is okay to do these things during bootstrap?
>
> ---
>  mm/slub.c |2 ++
>  1 file changed, 2 insertions(+)
>
> Index: linux-2.6/mm/slub.c
> ===
> --- linux-2.6.orig/mm/slub.c2007-11-17 11:31:21.044136631 -0800
> +++ linux-2.6/mm/slub.c 2007-11-17 11:32:17.364386560 -0800
> @@ -2044,7 +2044,9 @@ static struct kmem_cache_node *early_kme
>  #endif
> init_kmem_cache_node(n);
> atomic_long_inc(>nr_slabs);
> +   local_irq_disable();
> add_partial(kmalloc_caches, page, 0);
> +   local_irq_enable();
> return n;
>  }
>
>
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8

2007-11-17 Thread David

Greg KH wrote:
> On Sat, Nov 17, 2007 at 07:21:35PM +0100, Javier Kohen wrote:
>   
>> I upgraded today from 2.6.23 to 2.6.23.8 and started seeing a lot of
>> these in the logs:
>> 
>
> Can you see if the problem showed up in 2.6.23.2 or .3 to help narrow
> this down?
>   
This is the culprit, reverting fixes the issue.

Cheers
David

--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -80,10 +80,11 @@ void softlockup_tick(void)
print_timestamp = per_cpu(print_timestamp, this_cpu);

/* report at most once a second */
-   if (print_timestamp < (touch_timestamp + 1) ||
-   did_panic ||
-   !per_cpu(watchdog_task, this_cpu))
+   if ((print_timestamp >= touch_timestamp &&
+   print_timestamp < (touch_timestamp + 1)) ||
+   did_panic || !per_cpu(watchdog_task, this_cpu)) {
return;
+   }

/* do not print during early bootup: */
if (unlikely(system_state != SYSTEM_RUNNING)) {

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8

2007-11-17 Thread Greg KH

On Sat, Nov 17, 2007 at 07:21:35PM +0100, Javier Kohen wrote:
> I upgraded today from 2.6.23 to 2.6.23.8 and started seeing a lot of
> these in the logs:

Can you see if the problem showed up in 2.6.23.2 or .3 to help narrow
this down?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Soft lockups since stable kernel upgrade to 2.6.23.8

2007-11-17 Thread David

Javier Kohen wrote:
> I upgraded today from 2.6.23 to 2.6.23.8 and started seeing a lot of
> these in the logs:
>
> BUG: soft lockup detected on CPU#0!
>  [] update_process_times+0x32/0x54
>  [] tick_sched_timer+0x5e/0x99
>  [] hrtimer_interrupt+0x112/0x197
>  [] tick_sched_timer+0x0/0x99
>  [] smp_apic_timer_interrupt+0x60/0x6f
>  [] acpi_hw_register_write+0x118/0x148
>  [] apic_timer_interrupt+0x28/0x30
>  [] acpi_safe_halt+0x14/0x20 [processor]
>  [] acpi_processor_idle+0x134/0x387 [processor]
>  [] cpu_idle+0x46/0x59
>  [] start_kernel+0x23c/0x241
>  [] unknown_bootoption+0x0/0x196
>   
Confirmed on my server machine. Seems to coincide with cpufreq processor
speed changes. Config available on request, (system is also a single
core athlon)

Cheers
David

>  ===
> BUG: soft lockup detected on CPU#0!
>  [] update_process_times+0x32/0x54
>  [] tick_sched_timer+0x5e/0x99
>  [] hrtimer_interrupt+0x112/0x197
>  [] tick_sched_timer+0x0/0x99
>  [] smp_apic_timer_interrupt+0x60/0x6f
>  [] apic_timer_interrupt+0x28/0x30
>  ===
> BUG: soft lockup detected on CPU#0!
>  [] update_process_times+0x32/0x54
>  [] fill_window+0x29d/0x384
>  [] tick_sched_timer+0x5e/0x99
>  [] hrtimer_interrupt+0x112/0x197
>  [] tick_sched_timer+0x0/0x99
>  [] zlib_inflate_table+0x1d9/0x4c0
>  [] zlib_inflate_table+0x1d9/0x4c0
>  [] tick_do_broadcast+0x1f/0x3f
>  [] tick_handle_oneshot_broadcast+0x47/0x7f
>  [] timer_interrupt+0x1a/0x20
>  [] handle_IRQ_event+0x1a/0x3f
>  [] handle_edge_irq+0x8b/0xd7
>  [] do_IRQ+0x53/0x6c
>  [] tick_notify+0x161/0x220
>  [] common_interrupt+0x23/0x28
>  [] acpi_processor_idle+0x22c/0x387 [processor]
>  [] cpu_idle+0x46/0x59
>  [] start_kernel+0x23c/0x241
>  [] unknown_bootoption+0x0/0x196
>
> I'm getting them in the hundreds but I had never seen them before this
> upgrade. CPU is a single CPU, single core AMD Turion running in 32-bit
> mode. Apparently they only occur when the ondemand governor is used. I
> switched to the powersave and the performance governors for a while and
> didn't see any message, but as soon as I went back to ondemand, the
> messages started showing up again.
>
> I see the problem might have to do with timers. In case it's relevant,
> the available clock sources are acpi_pm pit jiffies tsc, of which
> acpi_pm is the current one in use. I'm including the kernel config as
> well.
>
> Please CC, since I'm not subscribed to this list.
>
> Modules Loaded: nls_iso8859_1 nls_cp437 vfat fat radeon drm af_packet
> binfmt_misc capability commoncap ipv6 iptable_mangle iptable_filter
> ip_tables x_tables ext2 snd_seq_dummy snd_seq_oss snd_seq_midi
> snd_rawmidi snd_seq_midi_event snd_seq snd_seq_device cpufreq_ondemand
> cpufreq_conservative cpufreq_powersave powernow_k8 freq_table snd_atiixp
> snd_atiixp_modem snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss
> snd_pcm snd_timer battery ac snd yenta_socket rsrc_nonstatic pcmcia_core
> tifm_7xx1 tifm_core button soundcore snd_page_alloc psmouse pcspkr evdev
> k8temp hwmon rtc sha256 aes dm_crypt dm_mirror dm_snapshot dm_mod sg
> sd_mod sr_mod cdrom 8139cp usb_storage ohci1394 pata_atiixp 8139too mii
> bitrev crc32 ehci_hcd ieee1394 libata ohci_hcd usbcore thermal processor
> fan
>
>
> CONFIG_X86_32=y
> CONFIG_GENERIC_TIME=y
> CONFIG_GENERIC_CMOS_UPDATE=y
> CONFIG_CLOCKSOURCE_WATCHDOG=y
> CONFIG_GENERIC_CLOCKEVENTS=y
> CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
> CONFIG_LOCKDEP_SUPPORT=y
> CONFIG_STACKTRACE_SUPPORT=y
> CONFIG_SEMAPHORE_SLEEPERS=y
> CONFIG_X86=y
> CONFIG_MMU=y
> CONFIG_ZONE_DMA=y
> CONFIG_QUICKLIST=y
> CONFIG_GENERIC_ISA_DMA=y
> CONFIG_GENERIC_IOMAP=y
> CONFIG_GENERIC_BUG=y
> CONFIG_GENERIC_HWEIGHT=y
> CONFIG_ARCH_MAY_HAVE_PC_FDC=y
> CONFIG_DMI=y
> CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
>
> CONFIG_EXPERIMENTAL=y
> CONFIG_BROKEN_ON_SMP=y
> CONFIG_INIT_ENV_ARG_LIMIT=32
> CONFIG_LOCALVERSION=""
> CONFIG_SWAP=y
> CONFIG_SYSVIPC=y
> CONFIG_SYSVIPC_SYSCTL=y
> CONFIG_POSIX_MQUEUE=y
> CONFIG_BSD_PROCESS_ACCT=y
> CONFIG_LOG_BUF_SHIFT=14
> CONFIG_BLK_DEV_INITRD=y
> CONFIG_INITRAMFS_SOURCE=""
> CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> CONFIG_SYSCTL=y
> CONFIG_UID16=y
> CONFIG_SYSCTL_SYSCALL=y
> CONFIG_KALLSYMS=y
> CONFIG_HOTPLUG=y
> CONFIG_PRINTK=y
> CONFIG_BUG=y
> CONFIG_ELF_CORE=y
> CONFIG_BASE_FULL=y
> CONFIG_FUTEX=y
> CONFIG_ANON_INODES=y
> CONFIG_EPOLL=y
> CONFIG_SIGNALFD=y
> CONFIG_EVENTFD=y
> CONFIG_SHMEM=y
> CONFIG_VM_EVENT_COUNTERS=y
> CONFIG_SLUB_DEBUG=y
> CONFIG_SLUB=y
> CONFIG_RT_MUTEXES=y
> CONFIG_BASE_SMALL=0
> CONFIG_MODULES=y
> CONFIG_MODULE_UNLOAD=y
> CONFIG_KMOD=y
> CONFIG_BLOCK=y
> CONFIG_LSF=y
>
> CONFIG_IOSCHED_NOOP=y
> CONFIG_IOSCHED_AS=y
> CONFIG_IOSCHED_DEADLINE=y
> CONFIG_IOSCHED_CFQ=y
> CONFIG_DEFAULT_CFQ=y
> CONFIG_DEFAULT_IOSCHED="cfq"
>
> CONFIG_TICK_ONESHOT=y
> CONFIG_NO_HZ=y
> CONFIG_HIGH_RES_TIMERS=y
> CONFIG_X86_PC=y
> CONFIG_MK8=y
> CONFIG_X86_CMPXCHG=y
> CONFIG_X86_L1_CACHE_SHIFT=6
> CONFIG_X86_XADD=y
> CONFIG_RWSEM_XCHGADD_ALGORITHM=y
>

Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-17 Thread Torsten Kaiser

On Nov 17, 2007 7:19 PM, Andrew Morton <[EMAIL PROTECTED]> wrote:
>
> On Sat, 17 Nov 2007 19:09:46 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> >
> > * Torsten Kaiser <[EMAIL PROTECTED]> wrote:
> >
> > > Sadly lockdep does not work for me, as it gets turned off early:
> > > [   39.851594] -
> > > [   39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> > > [   39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
> > > [   39.866963]  (>list_lock){-+..}, at: []
> >
> > hey, that means it found a bug - which is not sad at all :-)

It was sad, that it found a bug that I was not searching for. ;)

> mutter.
>
> Torsten, you could try CONFIG_SLAB=y, CONFIG_SLUB=n to see if you can make
> some progress on the NFS problem.

I should had thought of that myself... OK anyway here is the result:

The hang is reproducable, emerge froze the system again after download
the source.
Lockdep triggers immedetly before the freeze, but the result is still
not helpful:

[  221.565011] INFO: trying to register non-static key.
[  221.566999] the code is fine but needs lockdep annotation.
[  221.569206] turning off the locking correctness validator.
[  221.571404]
[  221.571405] Call Trace:
[  221.572996]  [] __lock_acquire+0x4c4/0x1140
[  221.575298]  [] lock_acquire+0x55/0x70
[  221.577429]  [] __wake_up+0x2d/0x70
[  221.579457]  [] _spin_lock_irqsave+0x34/0x50
[  221.581800]  [] _spin_unlock_irqrestore+0x55/0x70
[  221.584317]  [] __wake_up+0x2d/0x70
[  221.586344]  [] rpc_async_schedule+0x0/0x10
[  221.588648]  [] nfs_free_unlinkdata+0x1e/0x50
[  221.591023]  [] rpc_release_calldata+0x26/0x50
[  221.593428]  [] run_workqueue+0x16f/0x210
[  221.595662]  [] trace_hardirqs_on+0xc1/0x160
[  221.598004]  [] worker_thread+0x0/0xb0
[  221.600130]  [] worker_thread+0x0/0xb0
[  221.602265]  [] worker_thread+0x6d/0xb0
[  221.604431]  [] autoremove_wake_function+0x0/0x30
[  221.606939]  [] worker_thread+0x0/0xb0
[  221.609067]  [] worker_thread+0x0/0xb0
[  221.611199]  [] kthread+0x4b/0x80
[  221.613156]  [] child_rip+0xa/0x12
[  221.615151]  [] restore_args+0x0/0x30
[  221.617247]  [] kthread+0x0/0x80
[  221.619162]  [] child_rip+0x0/0x12
[  221.621147]
[  221.621749] INFO: lockdep is turned off.
[  226.369259] SysRq : Emergency Sync
[  226.331342] Emergency Sync complete
[  227.064545] SysRq : Emergency Remount R/O
[  228.193491] SysRq : Emergency Sync
[  228.155593] Emergency Sync complete
[  228.767931] SysRq : Resetting

I also had another BUG output during system startup, but that should
be unrelated:
[  103.254681] BUG: sleeping function called from invalid context at
kernel/rwsem.c:20
[  103.257757] in_atomic():0, irqs_disabled():1
[  103.259469] 1 lock held by artsd/5883:
[  103.259470]  #0:  (pm_qos_lock){}, at: []
pm_qos_add_requirement+0x6b/0xf0
[  103.263316] irq event stamp: 49712
[  103.263318] hardirqs last  enabled at (49711): []
__kmalloc+0x10d/0x180
[  103.263321] hardirqs last disabled at (49712): []
_spin_lock_irqsave+0x1a/0x50
[  103.263326] softirqs last  enabled at (48820): []
unix_release_sock+0x79/0x240
[  103.263330] softirqs last disabled at (48818): []
_write_lock_bh+0x9/0x30
[  103.26]
[  103.26] Call Trace:
[  103.263335]  [] down_read+0x15/0x40
[  103.263338]  [] __blocking_notifier_call_chain+0x46/0x90
[  103.263341]  [] pm_qos_add_requirement+0x93/0xf0
[  103.263344]  [] snd_pcm_hw_params+0x2fa/0x380
[  103.263347]  [] snd_pcm_common_ioctl1+0xb4c/0xdc0
[  103.263350]  [] __do_fault+0x227/0x470
[  103.263353]  [] __lock_acquire+0x745/0x1140
[  103.263357]  [] _spin_unlock_irqrestore+0x55/0x70
[  103.263359]  [] trace_hardirqs_on+0xc1/0x160
[  103.263362]  [] snd_pcm_playback_ioctl1+0x48/0x240
[  103.263365]  [] snd_pcm_playback_ioctl+0x36/0x50
[  103.263367]  [] vfs_ioctl+0x2f/0xa0
[  103.263369]  [] do_vfs_ioctl+0x260/0x2e0
[  103.263371]  [] trace_hardirqs_on+0xc1/0x160
[  103.263373]  [] sys_ioctl+0x91/0xb0
[  103.263376]  [] system_call+0x7e/0x83
[  103.263379]

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] Printk kernel version in WARN_ON

2007-11-17 Thread Sam Ravnborg

On Sat, Nov 17, 2007 at 11:35:01AM -0800, Arjan van de Ven wrote:
> On Sat, 17 Nov 2007 10:46:52 -0800
> Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > > by ... not too much at least, gcc ought to be quite good at merging
> > > same-strings into one, so it's just one extra pointer argument
> > > 
> > 
> > I think I knew that.  At 1000 callsites.
> 
> ok so how about putting the same into dump_stack() instead? (see below)
> added bonus is that it's now present for all dumps that use
> dump_stack(), not just WARN_ON()
> (the format I copied from the exact line used by oopses)

This solved the "zillion files being rebuild" issue I mentioned.
So from that angle it is better.

And I notice you use the namespace aware helpers to access the
kernelrelease string - I assume this is better than direct use
of UTS_RELEASE.

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Linux 2.4.36-pre2

2007-11-17 Thread Willy Tarreau

I've just released Linux 2.4.36-pre2.

Basically, it gets in sync with 2.4.35.4, and adds DMA support for
2 PCI IDE chipsets (ICH7 and JMicron 20363).

There's just the adutux driver pending before 2.4.36. I plan to issue
2.4.36-rc1 in about 2-3 weeks (first week of december), and -final the
following week if nothing goes wrong till there.

The patch and changelog will appear soon at the following locations:
  ftp://ftp.all.kernel.org/pub/linux/kernel/v2.4/testing/
  ftp://ftp.all.kernel.org/pub/linux/kernel/v2.4/testing/patch-2.4.36-pre2.bz2
  ftp://ftp.all.kernel.org/pub/linux/kernel/v2.4/testing/patch-2.4.36.log

Git repository:
   git://git.kernel.org/pub/scm/linux/kernel/git/wtarreau/linux-2.4.git
  http://www.kernel.org/pub/scm/linux/kernel/git/wtarreau/linux-2.4.git/

Git repository through the gitweb interface:
  http://git.kernel.org/?p=linux/kernel/git/wtarreau/linux-2.4.git


Regards,
Willy
---

Summary of changes from v2.4.36-pre1 to v2.4.36-pre2


Andi Kleen (1):
  x86_64: Make sure to validate all 64bits of ptrace information

Franck Bourdonnec (1):
  fix missing MODULE_LICENSE in some drivers

Gilles Espinasse (1):
  fix unresolved symbols on alpha

Moritz Muehlenhoff (1):
  corrupted cramfs filesystems cause kernel oops (CVE-2006-5823)

Stephen Hemminger (1):
  Bridge STP timer fixes

Tony Battersby (1):
  sym53c8xx_2 SMP deadlock on driver load

Willy Tarreau (3):
  ATM: avoid kernel panic upon access to /proc/net/atm/arp
  PPP: fix crash using usb-serial on high speed devices
  Change VERSION to 2.4.36-pre2

dann frazier (4):
  [OpenPROM]: Fix signedness bug in openprom char driver
  [OpenPROM]: Fix user-access checking bugs in openpromfs
  [OpenPROM] Prevent overflow of sprintf buffer
  [OpenPROM] Prevent unsigned roll-overs in

[EMAIL PROTECTED] (2):
  IDE: enable support for JMicron 20363
  IDE: enable PATA UDMA support for ICH7

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] Printk kernel version in WARN_ON

2007-11-17 Thread Arjan van de Ven

On Sat, 17 Nov 2007 10:46:52 -0800
Andrew Morton <[EMAIL PROTECTED]> wrote:

> > by ... not too much at least, gcc ought to be quite good at merging
> > same-strings into one, so it's just one extra pointer argument
> > 
> 
> I think I knew that.  At 1000 callsites.

ok so how about putting the same into dump_stack() instead? (see below)
added bonus is that it's now present for all dumps that use
dump_stack(), not just WARN_ON()
(the format I copied from the exact line used by oopses)

Subject: printk kernel version in WARN_ON and other dump_stack users
From: Arjan van de Ven <[EMAIL PROTECTED]>

today, all oopses contain a version number of the kernel, which is nice
because the people who actually do bother to read the oops get this
vital bit of information always without having to ask the reporter in
another round trip.

However, WARN_ON() and many other dump_stack() users right now lack this 
information; the patch below adds this. This information is essential for 
getting people to use their time effectively when looking at these things;
in addition, it's essential for tools that try to collect statistics about 
defects.

Please consider, maybe even for 2.6.24 since its so simple and
important for long term quality processes

The code is identical between 32/64 bit; a lot of this code should be unified 
over time,
the patch keeps the identical-ness in tact.

Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>

--- linux-2.6.24-rc3/arch/x86/kernel/traps_32.c.org 2007-11-17 
11:26:17.0 -0800
+++ linux-2.6.24-rc3/arch/x86/kernel/traps_32.c 2007-11-17 11:29:12.0 
-0800
@@ -283,6 +283,11 @@ void dump_stack(void)
 {
unsigned long stack;

+   printk("Pid: %d, comm: %.20s %s %s %.*s\n",
+   current->pid, current->comm, print_tainted(),
+   init_utsname()->release,
+   (int)strcspn(init_utsname()->version, " "),
+   init_utsname()->version);
show_trace(current, NULL, );
 }

--- linux-2.6.24-rc3/arch/x86/kernel/traps_64.c.org 2007-11-17 
11:26:25.0 -0800
+++ linux-2.6.24-rc3/arch/x86/kernel/traps_64.c 2007-11-17 11:29:22.0 
-0800
@@ -400,6 +400,12 @@ void show_stack(struct task_struct *tsk,
 void dump_stack(void)
 {
unsigned long dummy;
+
+   printk("Pid: %d, comm: %.20s %s %s %.*s\n",
+   current->pid, current->comm, print_tainted(),
+   init_utsname()->release,
+   (int)strcspn(init_utsname()->version, " "),
+   init_utsname()->version);
show_trace(NULL, NULL, );
 }

-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-17 Thread Christoph Lameter

On Sat, 17 Nov 2007, Andrew Morton wrote:

> > Don't know who to bug about that.
> 
> That's slub.  It appears that list_lock is being taken from process context
> in one place and from softirq in another.

I kicked out some weird interrupt disable code in mm that was only run during
NUMA bootstrap.

This should fix it but isnt there some mechanism to convince lockdep that 
it is okay to do these things during bootstrap?

---
 mm/slub.c |2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6/mm/slub.c
===
--- linux-2.6.orig/mm/slub.c2007-11-17 11:31:21.044136631 -0800
+++ linux-2.6/mm/slub.c 2007-11-17 11:32:17.364386560 -0800
@@ -2044,7 +2044,9 @@ static struct kmem_cache_node *early_kme
 #endif
init_kmem_cache_node(n);
atomic_long_inc(>nr_slabs);
+   local_irq_disable();
add_partial(kmalloc_caches, page, 0);
+   local_irq_enable();
return n;
 }
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] Printk kernel version in WARN_ON

2007-11-17 Thread Sam Ravnborg

On Sat, Nov 17, 2007 at 10:15:52AM -0800, Arjan van de Ven wrote:
> Hi,
> 
> today, all oopses contain a version number of the kernel, which is nice
> because the people who actually do bother to read the oops get this
> vital bit of information always without having to ask the reporter in
> another round trip.
> 
> However, WARN_ON() right now lacks this information; the patch below
> adds this. This information is essential for getting people to use
> their time effectively when looking at these things; in addition, it's
> essential for tools that try to collect statistics about defects.
> 
> Please consider, maybe even for 2.6.24 since its so simple and
> important for long term quality

With this change we will see zillions of files being rebuild each
time we pick up another kernel version from git and friends.

For me it looks like this right now:
#define UTS_RELEASE "2.6.24-rc2-g99fee6d7-dirty"

committing my local changes made it look like:
#define UTS_RELEASE "2.6.24-rc2-g99fee6d7"

The above change will trigger a rebuild of all files
that reference UTS_RELEASE as will all WARN_ON users.

And this is with the default configuration.

So if we want this then we want to push that change to a
seperate function so we rebuild less files for simple changes.

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Linux 2.4.35.4

2007-11-17 Thread Willy Tarreau

I've just released Linux 2.4.35.4.

It fixes very minor issues, but some patches have been resting here
for a long time, and there was no reason to hold them. Two minor
old vulnerabilities were fixed :
 - CVE-2006-5823 would cause the kernel to oops on specially crafted
   CRAMFS filesystems, though on 2.4.35 we never got more than an
   error in the logs. The fix was merged anyway.
 - CVE-2004-2731 corresponds to an incorrect size check in sparc's
   openprom driver. It was fixed in 2.5 but not in 2.4.

The patch and changelog will appear soon at the following locations:
  ftp://ftp.all.kernel.org/pub/linux/kernel/v2.4/
  ftp://ftp.all.kernel.org/pub/linux/kernel/v2.4/patch-2.4.35.4.bz2
  ftp://ftp.all.kernel.org/pub/linux/kernel/v2.4/ChangeLog-2.4.35.4

Git repository:
   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-v2.4.35.y.git
  http://www.kernel.org/pub/scm/linux/kernel/git/stable/linux-v2.4.35.y.git

Git repository through the gitweb interface:
  http://git.kernel.org/?p=linux/kernel/git/stable/linux-v2.4.35.y.git


Regards,
Willy

---

Summary of changes from v2.4.35.3 to v2.4.35.4


Franck Bourdonnec (1):
  fix missing MODULE_LICENSE in some drivers

Gilles Espinasse (1):
  fix unresolved symbols on alpha

Moritz Muehlenhoff (1):
  corrupted cramfs filesystems cause kernel oops (CVE-2006-5823)

Tony Battersby (1):
  sym53c8xx_2 SMP deadlock on driver load

Willy Tarreau (2):
  PPP: fix crash using usb-serial on high speed devices
  Change VERSION to 2.4.35.4

dann frazier (4):
  [OpenPROM]: Fix signedness bug in openprom char driver
  [OpenPROM]: Fix user-access checking bugs in openpromfs
  [OpenPROM] Prevent overflow of sprintf buffer
  [OpenPROM] Prevent unsigned roll-overs in

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: More LSM vs. Containers (having nothing at all to do with the AppArmor Security Goal)

2007-11-17 Thread Casey Schaufler


--- Peter Dolding <[EMAIL PROTECTED]> wrote:

> On Nov 17, 2007 11:08 AM, Crispin Cowan <[EMAIL PROTECTED]> wrote:
> > Peter Dolding wrote:
> > >>> What is left unspecified here is 'how' a child 'with its own profile'
> is
> > >>> confined here. Are it is confined to just its own profile, it may that
> > >>> the "complicit process" communication may need to be wider specified to
> > >>> include this.
> > >>>
> > > Sorry have to bring this up.  cgroups why not?
> > Because I can't find any documentation for cgroups? :)
> >
> > >   Assign application to
> > > a cgroup that contains there filesystem access permissions.   Done
> > > right this could even be stacked.  Only give less access to
> > > application unless LSM particularly overrides.
> > >
> > This comes no where close to AppArmor's functionality:
> >
> > * Can't do learning mode
> > * Can't do wildcards
> > * Sucks up huge loads of memory to do that much FS mounting (imagine
> >   thousands of bind mounts)
> > * I'm not sure, but I don't think you can do file granularity, only
> >   directories
> >
> Ok sorry to say so far almost percent wrong.  Please note netlabels
> falls into a control group.  All function of Apparmor is doable bar
> exactly learning mode.   For learning mode that would have to be a
> hook back to a LSM I would expect.
> 
> Done right should suck up no more memory than current apparmor.  But
> it will required all LSM's doing file access to come to common
> agreement how to do it.  Not just hooks into the kernel system any
> more.

The ability to provide alternative access control schemes is the
purpose of LSM. The fact that we insane security people can't come
to the agreement you require is why we have LSM. You can not have
what you are asking for. Please suggest an alternative design.

> At the container entrance point there needs file granularity applied
> for complete and correct container isolation to be done.
> >
> > > There are reasons why I keep on bring containers up it changes the
> > > model.  Yes I know coming to a common agreement in these sections will
> > > not be simple.   But at some point it has to be done.
> > >
> > Containers and access controls provide related but different functions.
> > Stop trying to force containers to be an access control system, it does
> > not fit well at all.
> >
> > Rather, we need to ensure that LSM and containers play well together.
> > What you proposed in the past was to have an LSM module per container,
> > but I find that absurdly complex: if you want that, then use a real VMM
> > like Xen or something. Containers are mostly used for massive virtual
> > domain hosting, and what you want there is as much sharing as possible
> > while maintaining isolation. so why would you corrupt that with separate
> > LSM modules per container?
> 
> Please stop being foolish.  Xen and the like don't share memory.   You
> are basically saying blow out memory usage just because someone wants
> to use a different LSM.

Yup. No one ever said security was cheap. Most real, serious security
solutions implemented today rely on separate physical machines for
isolation. Some progressive installations are using virtualization,
and the lunatic fringe uses the sort of systems well served by LSM.
Let's face it, people who really care are willing to pay a premium. 

> Besides file access control is part of running containers isolated in
> the first place and need to be LSM neutral.

File access control is within the scope of the LSM. If your feature
can't deal with that you need to fix your feature.

> This is the problem current model just will not work.  Some features
> are need in Linux kernel all the time and have to become LSM neutral
> due to the features of containers.

Sounds like a conflict in requirements.

> Next big after filesystem most likely will be the common security
> controls for devices.  These are just features need to complete
> containers.  Basically to do containers LSM have to be cut up.  Or
> containers function will be dependent on the current LSM to be use
> completely.

I would be perfectly happy without containers, just as I understand
you don't give a rat's pitute about LSMs. If you want my cooperation
and/or backing on containers, show me how they make my life better,
and how cutting up LSM is to my advantage. I am perfectly willing
to consider alternatives, but I confess that my natural reaction to
confrontation is to fight back.


Casey Schaufler
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-17 Thread Torsten Kaiser

On Nov 17, 2007 7:58 PM, Trond Myklebust <[EMAIL PROTECTED]> wrote:
>
> On Sat, 2007-11-17 at 18:53 +0100, Torsten Kaiser wrote:
> > On Nov 16, 2007 3:15 PM, Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
> > > Hi Andrew,
> > >
> > > The kernel enters the xmon state while running the file system
> > > stress on nfs v4 mounted partition.
> > [snip]
> > > 0:mon> t
> > > [c000dbd4fb50] c0069768 .__wake_up+0x54/0x88
> > > [c000dbd4fc00] d086b890 .nfs_sb_deactive+0x44/0x58 [nfs]
> > > [c000dbd4fc80] d0872658 .nfs_free_unlinkdata+0x2c/0x74 [nfs]
> > > [c000dbd4fd10] d0598510 .rpc_release_calldata+0x50/0x74 
> > > [sunrpc]
> > > [c000dbd4fda0] c008d960 .run_workqueue+0x10c/0x1f4
> > > [c000dbd4fe50] c008ec70 .worker_thread+0x118/0x138
> > > [c000dbd4ff00] c00939f4 .kthread+0x78/0xc4
> > > [c000dbd4ff90] c002b060 .kernel_thread+0x4c/0x68
>
> Could you try with the attached patch.
[snip]
> Fix is to move the call to nfs_sb_deactive() into
> nfs_async_unlink_release().

I realley doubt that will fix it.

My stacktrace was like:
run_workqueue
called: rpc_async_schedule
  that called: rpc_release_calldata
which points to: nfs_async_unlink_release
   that called: nfs_free_unlinkdata

So it does not matter for me if nfs_sb_deactive is called one step earlier.

Currently building with SLAB instead SLUB to see if lockdep tells something...

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: High priority tasks break SMP balancer?

2007-11-17 Thread Dmitry Adamushko

Micah,

ok, would it be possible to get "cat /proc/schedstat" output at the
moment when you observe the 'problem'? So we could try to analyze
behavior of the load balancer (yeah, we should have probably started
with this step)

something like this:

(the problem appears)
# cat /proc/schedstat

... wait either a few seconds or until the problem disappears
(whatever comes first)
# cat /proc/schedstat


TIA,

>
> --Micah
>

-- 
Best regards,
Dmitry Adamushko
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-17 Thread Trond Myklebust


On Sat, 2007-11-17 at 18:53 +0100, Torsten Kaiser wrote:
> On Nov 16, 2007 3:15 PM, Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
> > Hi Andrew,
> >
> > The kernel enters the xmon state while running the file system
> > stress on nfs v4 mounted partition.
> [snip]
> > 0:mon> t
> > [c000dbd4fb50] c0069768 .__wake_up+0x54/0x88
> > [c000dbd4fc00] d086b890 .nfs_sb_deactive+0x44/0x58 [nfs]
> > [c000dbd4fc80] d0872658 .nfs_free_unlinkdata+0x2c/0x74 [nfs]
> > [c000dbd4fd10] d0598510 .rpc_release_calldata+0x50/0x74 [sunrpc]
> > [c000dbd4fda0] c008d960 .run_workqueue+0x10c/0x1f4
> > [c000dbd4fe50] c008ec70 .worker_thread+0x118/0x138
> > [c000dbd4ff00] c00939f4 .kthread+0x78/0xc4
> > [c000dbd4ff90] c002b060 .kernel_thread+0x4c/0x68

Could you try with the attached patch.

Cheers
  Trond
--- Begin Message ---
We should really only be calling nfs_sb_deactive() at the end of an RPC
call, to balance the nfs_sb_active() call in nfs_do_call_unlink(). OTOH,
nfs_free_unlinkdata() can be called from a variety of other situations.

Fix is to move the call to nfs_sb_deactive() into
nfs_async_unlink_release().

Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
---

 fs/nfs/unlink.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index b97d3bb..c90862a 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -31,7 +31,6 @@ struct nfs_unlinkdata {
 static void
 nfs_free_unlinkdata(struct nfs_unlinkdata *data)
 {
-   nfs_sb_deactive(NFS_SERVER(data->dir));
iput(data->dir);
put_rpccred(data->cred);
kfree(data->args.name.name);
@@ -116,6 +115,7 @@ static void nfs_async_unlink_release(void *calldata)
struct nfs_unlinkdata   *data = calldata;
 
nfs_dec_sillycount(data->dir);
+   nfs_sb_deactive(NFS_SERVER(data->dir));
nfs_free_unlinkdata(data);
 }
 
--- End Message ---

Re: [PATCH v3 17/17] (Avoid overload)

2007-11-17 Thread Steven Rostedt


On Sat, 17 Nov 2007, Gregory Haskins wrote:

> >>> On Sat, Nov 17, 2007 at  1:33 AM, in message
> > This patch changes the searching for a run queue by a waking RT task
> > to try to pick another runqueue if the currently running task
> > is an RT task.
> >
> > The reason is that RT tasks behave different than normal
> > tasks. Preempting a normal task to run a RT task to keep
> > its cache hot is fine, because the preempted non-RT task
> > may wait on that same runqueue to run again unless the
> > migration thread comes along and pulls it off.
> >
> > RT tasks behave differently. If one is preempted, it makes
> > an active effort to continue to run. So by having a high
> > priority task preempt a lower priority RT task, that lower
> > RT task will then quickly try to run on another runqueue.
> > This will cause that lower RT task to replace its nice
> > hot cache (and TLB) with a completely cold one. This is
> > for the hope that the new high priority RT task will keep
> >  its cache hot.
> >
> > Remeber that this high priority RT task was just woken up.
> > So it may likely have been sleeping for several milliseconds,
> > and will end up with a cold cache anyway. RT tasks run till
> > they voluntarily stop, or are preempted by a higher priority
> > task. This means that it is unlikely that the woken RT task
> > will have a hot cache to wake up to. So pushing off a lower
> > RT task is just killing its cache for no good reason.
>
> You make some excellent points here.  Out of curiosity, have you tried a 
> comparison to see if it helps?

hehe, I was waiting for the "where's the numbers". Right now I don't have
them. Mainly because I don't have boxes with more that 4 CPUs on them. And
4 really doesn't make much of a difference.

If others out there would like to test, I'll write up a couple of versions
of this code and that way we can really get numbers for them.

Or perhaps someone would like to send me a 16way box. Although my wife
would kill me. ;-)

Hmm, I'm sure I can get to our lab and run some tests too.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch] Fix UML broken (was Re: User Mode Linux still broken in 2.6.23.1)

2007-11-17 Thread Jeff Dike

On Fri, Nov 16, 2007 at 04:00:22PM -0600, Rob Landley wrote:
> I wasn't cc'd, and missed it.  I'd like to test this, do you have a
> link?  (Or a bit more specificity than "a few weeks ago"?)

Here are the three patches:

http://marc.info/?l=linux-arch=119342916329510=2
http://marc.info/?l=linux-kernel=119342916529516=2
http://marc.info/?l=linux-kernel=119342708426910=2

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 16/17] Fix schedstat handling

2007-11-17 Thread Steven Rostedt


On Sat, 17 Nov 2007, Gregory Haskins wrote:

> >>> On Sat, Nov 17, 2007 at  1:21 AM, in message
> <[EMAIL PROTECTED]>, Steven Rostedt <[EMAIL PROTECTED]>
> wrote:
> > Gregory Haskins RT balancing broke sched domains.
>
> Doh! (though you mean s/domains/stats ;)

Heh, indeed.

>
> > This is a fix to allow it to still work.
> >
> > Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
> >
> > ---
> >  include/linux/sched.h   |3 ++-
> >  kernel/sched.c  |   17 ++---
> >  kernel/sched_fair.c |   19 ++-
> >  kernel/sched_idletask.c |3 ++-
> >  kernel/sched_rt.c   |3 ++-
> >  5 files changed, 34 insertions(+), 11 deletions(-)
> >
> > Index: linux-compile.git/kernel/sched.c
> > ===
> > --- linux-compile.git.orig/kernel/sched.c   2007-11-17 00:15:57.0 
> > -0500
> > +++ linux-compile.git/kernel/sched.c2007-11-17 00:15:57.0 
> > -0500
> > @@ -1453,6 +1453,7 @@ static int try_to_wake_up(struct task_st
> > unsigned long flags;
> > long old_state;
> > struct rq *rq;
> > +   struct sched_domain *this_sd = NULL;
> >  #ifdef CONFIG_SMP
> > int new_cpu;
> >  #endif
> > @@ -1476,10 +1477,20 @@ static int try_to_wake_up(struct task_st
> > schedstat_inc(rq, ttwu_count);
> > if (cpu == this_cpu)
> > schedstat_inc(rq, ttwu_local);
> > -   else
> > -   schedstat_inc(rq->sd, ttwu_wake_remote);
> > +   else {
> > +#ifdef CONFIG_SCHEDSTATS
> > +   struct sched_domain *sd;
> > +   for_each_domain(this_cpu, sd) {
> > +   if (cpu_isset(cpu, sd->span)) {
> > +   schedstat_inc(sd, ttwu_wake_remote);
> > +   this_sd = sd;
> > +   break;
> > +   }
> > +   }
> > +#endif /* CONFIG_SCHEDSTATES */
> > +   }
> >
> > -   new_cpu = p->sched_class->select_task_rq(p, sync);
> > +   new_cpu = p->sched_class->select_task_rq(p, this_sd, sync);
>
> I like this optimization, but I am thinking that the location of the stat 
> update is now no longer relevant.  It should potentially go *after* the 
> select_task_rq() so that we pick the sched_domain of the actual wake target, 
> not the historical affinity.  If that is accurate, I'm sure you can finagle 
> this optimization to work in that scenario too, but it will take a little 
> re-work.

Yeah, I can take a deep look. This was written late at night, so I need to
spend more "awake" hours on it.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 10/17] Remove some CFS specific code from the wakeup path of RT tasks

2007-11-17 Thread Steven Rostedt


On Sat, 17 Nov 2007, Gregory Haskins wrote:

> >>> On Sat, Nov 17, 2007 at  1:21 AM, in message
> <[EMAIL PROTECTED]>, Steven Rostedt <[EMAIL PROTECTED]>
> wrote:
>
> > +*/
> > +   if (idle_cpu(cpu) || cpu_rq(cpu)->nr_running > 1)
> > +   return cpu;
> > +
> > +   for_each_domain(cpu, sd) {
> > +   if (sd->flags & SD_WAKE_IDLE) {
> > +   cpus_and(tmp, sd->span, p->cpus_allowed);
> > +   for_each_cpu_mask(i, tmp) {
> > +   if (idle_cpu(i))
> > +   return i;
> 
> 
>
> Looks like some stuff that was added in 24 was inadvertently lost in the move 
> when you merged the patches up from 23.1-rt11.  The attached patch is updated 
> to move the new logic as well.
>

Doh!  Good catch. Will rework on Monday.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/8] UML - add virt_to_pte

2007-11-17 Thread Jeff Dike

On Sat, Nov 17, 2007 at 11:50:07AM +0100, Roel Kluin wrote:
> > +   if (!pte_present(*pte))
> > +   pte = NULL;
> 
> shouldn't you check again for (pte == NULL)?

No, because if the page isn't mapped, handle_page_fault would have
returned non-zero, and we would have already returned.

This is leaving aside issues of whether the page could have been
unmapped by another CPU (which isn't an issue right now, and for which
I have a patch to fix anyway).

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] Printk kernel version in WARN_ON

2007-11-17 Thread Andrew Morton

On Sat, 17 Nov 2007 10:39:47 -0800 Arjan van de Ven <[EMAIL PROTECTED]> wrote:

> On Sat, 17 Nov 2007 10:27:20 -0800
> Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > On Sat, 17 Nov 2007 10:15:52 -0800 Arjan van de Ven
> > <[EMAIL PROTECTED]> wrote:
> > 
> > > @@ -35,8 +36,8 @@ struct bug_entry {
> > >  #define WARN_ON(condition)
> > > ({\ int
> > > __ret_warn_on = !!(condition);\ if
> > > (unlikely(__ret_warn_on)) {   \
> > > - printk("WARNING: at %s:%d %s()\n",
> > > __FILE__, \
> > > - __LINE__,
> > > __FUNCTION__);\
> > > + printk("WARNING: at %s:%d %s()  (%s)\n",
> > > __FILE__, \
> > > + __LINE__, __FUNCTION__,
> > > UTS_RELEASE); \
> > > dump_stack();
> > > \ }
> > > \ unlikely(__ret_warn_on);\
> > 
> > that made our 1100-odd WARN_ON sites fatter.
> 
> by ... not too much at least, gcc ought to be quite good at merging
> same-strings into one, so it's just one extra pointer argument
> 

I think I knew that.  At 1000 callsites.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Bogus PCI vendor ID

2007-11-17 Thread Francois Romieu

Kai Ruhnau <[EMAIL PROTECTED]> :
[...]
> I have a problem with two of my PCI devices showing the wrong PCI vendor
> ID (0001) in vanilla kernels.
>
> My system currently runs a 32 bit x86 kernel built from ubuntu sources
> 2.6.20.3-ubuntu1. This is the only kernel I have found so far that shows
> the correct PCI IDs:
> 
> ~ # lspci -n
> 00:00.0 0600: 1002:7930
> 00:02.0 0604: 1002:7933
> 00:06.0 0604: 1002:7936
> 00:12.0 0106: 1002:4380
> 00:13.0 0c03: 1002:4387
> 00:13.1 0c03: 1002:4388
> 00:13.2 0c03: 1002:4389
> 00:13.3 0c03: 1002:438a
> 00:13.4 0c03: 1002:438b
> 00:13.5 0c03: 1002:4386
> 00:14.0 0c05: 1002:4385 (rev 13)
> 00:14.1 0101: 1002:438c
> 00:14.2 0403: 1002:4383
> 00:14.3 0601: 1002:438d
> 00:14.4 0604: 1002:4384
> 01:00.0 0300: 10de:0193 (rev a2)
> 02:00.0 0200: 11ab:4364 (rev 12)
> 03:02.0 0c00: 104c:8024
> 
> I tested several vanilla kernels: 2.6.23.1, 2.6.23, 2.6.22, 2.6.21 and
> somewhere between 2.6.20 and 2.6.21 via bisect. There, the output of
> lspci is as follows:
> 
> ~ # lspci -n
[snip]
> 00:14.4 0604: 1002:4384
> 01:00.0 0300: 0001:0193 (rev a2)
> 02:00.0 0200: 0001:4364 (rev 12)
> 03:02.0 0c00: 104c:8024
> 
> Note the two vendor IDs 0001 for 01:00.0 and 02:00.0.
> 
> Since these two devices are my graphics card (PCI express) and network
> card (builtin) respectively I have quite some trouble running my system
> without the right vendor IDs ;-)
> Can this be fixed ?

No idea but it seems to be plaguing us:

- sky2

Kai Ruhnau <[EMAIL PROTECTED]>
[...]
> However, I just booted 2.6.23.1 and additionally checked the output of
> lspci against that from 2.6.20-ubuntu.
> Between both versions the vendor code of my networke device changes from
> 11ab to 0001. Why that?

- r8169 + Abit fatal1ty motherboard

(so far nobody reported a 8169 on a fatal1ty motherboard with a
sensible vendor id)

Josh Logan <[EMAIL PROTECTED]>
[...]
> 2.6.20 and above, maybe .18 or .19.
>
> I have never seen the card recognized by a stock kernel.

Ciaran McCreesh <[EMAIL PROTECTED]>
[...]
> I've used 2.6.21.6 and 2.6.24-rc1, both gave 0x0001. I just tried
> 2.6.18.8 as well, still 0x0001. With earlier kernels my SATA devices

-- 
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] Printk kernel version in WARN_ON

2007-11-17 Thread Arjan van de Ven

On Sat, 17 Nov 2007 10:27:20 -0800
Andrew Morton <[EMAIL PROTECTED]> wrote:

> On Sat, 17 Nov 2007 10:15:52 -0800 Arjan van de Ven
> <[EMAIL PROTECTED]> wrote:
> 
> > @@ -35,8 +36,8 @@ struct bug_entry {
> >  #define WARN_ON(condition)
> > ({  \ int
> > __ret_warn_on = !!(condition);  \ if
> > (unlikely(__ret_warn_on)) { \
> > -   printk("WARNING: at %s:%d %s()\n",
> > __FILE__,   \
> > -   __LINE__,
> > __FUNCTION__);  \
> > +   printk("WARNING: at %s:%d %s()  (%s)\n",
> > __FILE__,   \
> > +   __LINE__, __FUNCTION__,
> > UTS_RELEASE);   \
> > dump_stack();
> > \ }
> > \ unlikely(__ret_warn_on);  \
> 
> that made our 1100-odd WARN_ON sites fatter.

by ... not too much at least, gcc ought to be quite good at merging
same-strings into one, so it's just one extra pointer argument



-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/4, v3] ACPI, PCI: ACPI PCI slot detection driver

2007-11-17 Thread Alex Chiang

Detect all physical PCI slots as described by ACPI, and create
entries in /sys/bus/pci/slots/.

Not all physical slots are hotpluggable, and the acpiphp module
does not detect them. Now we know the physical PCI geography of
our system, without caring about hotplug.

v2 -> v3:
Add Kconfig option to driver, allowing users to [de]config
this driver. If configured, take slightly different code
paths in pci_hp_register and pci_hp_deregister.

v1 -> v2:
Now recursively discovering p2p bridges and slots
underneath them. Hopefully, this will prevent us
from trying to register the same slot multiple times.

Signed-off-by: Alex Chiang <[EMAIL PROTECTED]>
---
 drivers/acpi/Kconfig   |9 ++
 drivers/acpi/Makefile  |1 +
 drivers/acpi/pci_slot.c|  203 
 drivers/pci/hotplug/pci_hotplug_core.c |   15 +++
 4 files changed, 228 insertions(+), 0 deletions(-)
 create mode 100644 drivers/acpi/pci_slot.c

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 087a702..b1ce260 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -293,6 +293,15 @@ config ACPI_EC
  the battery and thermal drivers.  If you are compiling for a 
  mobile system, say Y.
 
+config ACPI_PCI_SLOT
+   bool "PCI slot detection driver"
+   default n
+   help
+ This driver will attempt to discover all PCI slots in your system,
+ and creates entries in /sys/bus/pci/slots/. This feature can
+ help you correlate PCI bus addresses with the physical geography
+ of your slots. If you are unsure, say N.
+
 config ACPI_POWER
bool
default y
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 54e3ab0..d89000e 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_ACPI_DOCK)   += dock.o
 obj-$(CONFIG_ACPI_BAY) += bay.o
 obj-$(CONFIG_ACPI_VIDEO)   += video.o
 obj-y  += pci_root.o pci_link.o pci_irq.o pci_bind.o
+obj-$(CONFIG_ACPI_PCI_SLOT)+= pci_slot.o
 obj-$(CONFIG_ACPI_POWER)   += power.o
 obj-$(CONFIG_ACPI_PROCESSOR)   += processor.o
 obj-$(CONFIG_ACPI_CONTAINER)   += container.o
diff --git a/drivers/acpi/pci_slot.c b/drivers/acpi/pci_slot.c
new file mode 100644
index 000..22f076b
--- /dev/null
+++ b/drivers/acpi/pci_slot.c
@@ -0,0 +1,203 @@
+/*
+ *  pci_slot.c - ACPI PCI Slot Driver
+ *
+ *  The code here is heavily leveraged from the acpiphp module.
+ *  Thanks to Matthew Wilcox <[EMAIL PROTECTED]> for much guidance.
+ *
+ *  Copyright (C) 2007 Alex Chiang <[EMAIL PROTECTED]>
+ *  Copyright (C) 2007 Hewlett-Packard Development Company, L.P.
+ *
+ *  This program is free software; you can redistribute it and/or modify it
+ *  under the terms and conditions of the GNU General Public License,
+ *  version 2, as published by the Free Software Foundation.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *  General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License along
+ *  with this program; if not, write to the Free Software Foundation, Inc.,
+ *  51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define _COMPONENT ACPI_PCI_COMPONENT
+ACPI_MODULE_NAME("pci_slot");
+
+#define MY_NAME "pci_slot"
+#define err(format, arg...) printk(KERN_ERR "%s: " format , MY_NAME , ## arg)
+#define info(format, arg...) printk(KERN_INFO "%s: " format , MY_NAME , ## arg)
+
+static int acpi_pci_slot_add(acpi_handle handle);
+static void acpi_pci_slot_remove(acpi_handle handle);
+
+static struct acpi_pci_driver acpi_pci_slot_driver = {
+   .add = acpi_pci_slot_add,
+   .remove = acpi_pci_slot_remove,
+};
+
+/*
+ * register_slot - callback function to discover / create physical PCI slots
+ * @handle: any device underneath an acpi_pci_root (sometimes it's a slot
+ * device, sometimes not)
+ * @context: struct pci_bus
+ * The possible error conditions are non-fatal, so we always return
+ * AE_OK, as to not terminate our namespace walk prematurely.
+ */
+static acpi_status
+register_slot(acpi_handle handle, u32 lvl, void *context, void **rv)
+{
+   int device;
+   unsigned long adr, sun;
+   acpi_status status;
+   char name[KOBJ_NAME_LEN];
+
+   struct pci_slot *pci_slot;
+   struct pci_bus *pci_bus = context;
+
+   status = acpi_evaluate_integer(handle, "_ADR", NULL, );
+   if (ACPI_FAILURE(status))
+   return AE_OK;
+   device = (adr >> 16) & 0x;
+
+   /* No _SUN == not a slot == bail */
+   status = acpi_evaluate_integer(handle, "_SUN", NULL, );
+

[PATCH 3/4, v3] PCI, PCI Hotplug: Introduce pci_slot

2007-11-17 Thread Alex Chiang

  - Make pci_slot the primary sysfs entity. hotplug_slot becomes a
subsidiary structure.
o pci_create_slot() creates and registers a slot with the PCI core
o pci_slot_add_hotplug() gives it hotplug capability

  - Change the prototype of pci_hp_register() to take the bus and
slot number (on parent bus) as parameters.

  - Remove all the ->get_address methods since this functionality is
now handled by pci_slot directly.

v2 -> v3:
Separated slot creation and slot hotplug ability into two
interfaces. Fixed bugs in pci_destroy_slot(), and now
properly calling from pci_hp_deregister.

v1 -> v2:
No change

Signed-off-by: Alex Chiang <[EMAIL PROTECTED]>
Signed-off-by: Matthew Wilcox <[EMAIL PROTECTED]>
---
 drivers/pci/Makefile|2 +-
 drivers/pci/hotplug/acpiphp.h   |1 -
 drivers/pci/hotplug/acpiphp_core.c  |   23 +---
 drivers/pci/hotplug/acpiphp_glue.c  |   16 --
 drivers/pci/hotplug/acpiphp_ibm.c   |5 +-
 drivers/pci/hotplug/cpci_hotplug_core.c |2 +-
 drivers/pci/hotplug/cpqphp_core.c   |4 +-
 drivers/pci/hotplug/fakephp.c   |2 +-
 drivers/pci/hotplug/ibmphp_ebda.c   |3 +-
 drivers/pci/hotplug/pci_hotplug_core.c  |  242 +++
 drivers/pci/hotplug/pciehp_core.c   |   22 +--
 drivers/pci/hotplug/rpadlpar_sysfs.c|4 +-
 drivers/pci/hotplug/sgi_hotplug.c   |2 +-
 drivers/pci/hotplug/shpchp_core.c   |   17 +--
 drivers/pci/pci.h   |   13 ++
 drivers/pci/slot.c  |  184 +++
 include/linux/pci.h |   17 ++
 include/linux/pci_hotplug.h |   12 +-
 18 files changed, 328 insertions(+), 243 deletions(-)
 create mode 100644 drivers/pci/slot.c

diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 5550556..12f0b2d 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -2,7 +2,7 @@
 # Makefile for the PCI bus specific drivers.
 #
 
-obj-y  += access.o bus.o probe.o remove.o pci.o quirks.o \
+obj-y  += access.o bus.o probe.o remove.o pci.o quirks.o slot.o \
pci-driver.o search.o pci-sysfs.o rom.o setup-res.o
 obj-$(CONFIG_PROC_FS) += proc.o
 
diff --git a/drivers/pci/hotplug/acpiphp.h b/drivers/pci/hotplug/acpiphp.h
index f6cc0c5..ab46189 100644
--- a/drivers/pci/hotplug/acpiphp.h
+++ b/drivers/pci/hotplug/acpiphp.h
@@ -216,7 +216,6 @@ extern u8 acpiphp_get_power_status (struct acpiphp_slot 
*slot);
 extern u8 acpiphp_get_attention_status (struct acpiphp_slot *slot);
 extern u8 acpiphp_get_latch_status (struct acpiphp_slot *slot);
 extern u8 acpiphp_get_adapter_status (struct acpiphp_slot *slot);
-extern u32 acpiphp_get_address (struct acpiphp_slot *slot);
 
 /* variables */
 extern int acpiphp_debug;
diff --git a/drivers/pci/hotplug/acpiphp_core.c 
b/drivers/pci/hotplug/acpiphp_core.c
index a0ca63a..34b8d0b 100644
--- a/drivers/pci/hotplug/acpiphp_core.c
+++ b/drivers/pci/hotplug/acpiphp_core.c
@@ -70,7 +70,6 @@ static int disable_slot   (struct hotplug_slot 
*slot);
 static int set_attention_status (struct hotplug_slot *slot, u8 value);
 static int get_power_status(struct hotplug_slot *slot, u8 *value);
 static int get_attention_status (struct hotplug_slot *slot, u8 *value);
-static int get_address (struct hotplug_slot *slot, u32 *value);
 static int get_latch_status(struct hotplug_slot *slot, u8 *value);
 static int get_adapter_status  (struct hotplug_slot *slot, u8 *value);
 
@@ -83,7 +82,6 @@ static struct hotplug_slot_ops acpi_hotplug_slot_ops = {
.get_attention_status   = get_attention_status,
.get_latch_status   = get_latch_status,
.get_adapter_status = get_adapter_status,
-   .get_address= get_address,
 };
 
 
@@ -279,23 +277,6 @@ static int get_adapter_status(struct hotplug_slot 
*hotplug_slot, u8 *value)
return 0;
 }
 
-
-/**
- * get_address - get pci address of a slot
- * @hotplug_slot: slot to get status
- * @value: pointer to struct pci_busdev (seg, bus, dev)
- */
-static int get_address(struct hotplug_slot *hotplug_slot, u32 *value)
-{
-   struct slot *slot = hotplug_slot->private;
-
-   dbg("%s - physical_slot = %s\n", __FUNCTION__, hotplug_slot->name);
-
-   *value = acpiphp_get_address(slot->acpi_slot);
-
-   return 0;
-}
-
 static int __init init_acpi(void)
 {
int retval;
@@ -362,7 +343,9 @@ int acpiphp_register_hotplug_slot(struct acpiphp_slot 
*acpiphp_slot)
acpiphp_slot->slot = slot;
snprintf(slot->name, sizeof(slot->name), "%u", slot->acpi_slot->sun);
 
-   retval = pci_hp_register(slot->hotplug_slot);
+   retval = pci_hp_register(slot->hotplug_slot,
+   acpiphp_slot->bridge->pci_bus,
+   acpiphp_slot->device);
if (retval) {
err("pci_hp_register failed

[PATCH 2/4, v3] PCI Hotplug: Construct one fakephp slot per pci slot

2007-11-17 Thread Alex Chiang

Register one slot per slot, rather than one slot per function.
Change the name of the slot to fake%d instead of the pci address.

Signed-off-by: Alex Chiang <[EMAIL PROTECTED]>
Signed-off-by: Matthew Wilcox <[EMAIL PROTECTED]>
---
 drivers/pci/hotplug/fakephp.c |   80 +++-
 1 files changed, 30 insertions(+), 50 deletions(-)

diff --git a/drivers/pci/hotplug/fakephp.c b/drivers/pci/hotplug/fakephp.c
index 027f686..996c942 100644
--- a/drivers/pci/hotplug/fakephp.c
+++ b/drivers/pci/hotplug/fakephp.c
@@ -63,6 +63,7 @@ struct dummy_slot {
struct list_head node;
struct hotplug_slot *slot;
struct pci_dev *dev;
+   char name[8];
 };
 
 static int debug;
@@ -93,6 +94,7 @@ static int add_slot(struct pci_dev *dev)
struct dummy_slot *dslot;
struct hotplug_slot *slot;
int retval = -ENOMEM;
+   static int count = 1;
 
slot = kzalloc(sizeof(struct hotplug_slot), GFP_KERNEL);
if (!slot)
@@ -106,13 +108,14 @@ static int add_slot(struct pci_dev *dev)
slot->info->max_bus_speed = PCI_SPEED_UNKNOWN;
slot->info->cur_bus_speed = PCI_SPEED_UNKNOWN;
 
-   slot->name = >dev.bus_id[0];
-   dbg("slot->name = %s\n", slot->name);
-
dslot = kmalloc(sizeof(struct dummy_slot), GFP_KERNEL);
if (!dslot)
goto error_info;
 
+   slot->name = dslot->name;
+   snprintf(slot->name, sizeof(dslot->name), "fake%d", count++);
+   dbg("slot->name = %s\n", slot->name);
+
slot->ops = _hotplug_slot_ops;
slot->release = _release;
slot->private = dslot;
@@ -141,17 +144,17 @@ error:
 static int __init pci_scan_buses(void)
 {
struct pci_dev *dev = NULL;
-   int retval = 0;
+   int lastslot = 0;
 
while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
-   retval = add_slot(dev);
-   if (retval) {
-   pci_dev_put(dev);
-   break;
-   }
+   if (PCI_FUNC(dev->devfn) > 0 &&
+   lastslot == PCI_SLOT(dev->devfn))
+   continue;
+   lastslot = PCI_SLOT(dev->devfn);
+   add_slot(dev);
}
 
-   return retval;
+   return 0;
 }
 
 static void remove_slot(struct dummy_slot *dslot)
@@ -275,23 +278,9 @@ static int enable_slot(struct hotplug_slot *hotplug_slot)
return -ENODEV;
 }
 
-/* find the hotplug_slot for the pci_dev */
-static struct hotplug_slot *get_slot_from_dev(struct pci_dev *dev)
-{
-   struct dummy_slot *dslot;
-
-   list_for_each_entry(dslot, _list, node) {
-   if (dslot->dev == dev)
-   return dslot->slot;
-   }
-   return NULL;
-}
-
-
 static int disable_slot(struct hotplug_slot *slot)
 {
struct dummy_slot *dslot;
-   struct hotplug_slot *hslot;
struct pci_dev *dev;
int func;
 
@@ -301,36 +290,27 @@ static int disable_slot(struct hotplug_slot *slot)
 
dbg("%s - physical_slot = %s\n", __FUNCTION__, slot->name);
 
-   /* don't disable bridged devices just yet, we can't handle them 
easily... */
-   if (dslot->dev->subordinate) {
-   err("Can't remove PCI devices with other PCI devices behind it 
yet.\n");
-   return -ENODEV;
-   }
-   /* search for subfunctions and disable them first */
-   if (!(dslot->dev->devfn & 7)) {
-   for (func = 1; func < 8; func++) {
-   dev = pci_get_slot(dslot->dev->bus,
-   dslot->dev->devfn + func);
-   if (dev) {
-   hslot = get_slot_from_dev(dev);
-   if (hslot)
-   disable_slot(hslot);
-   else {
-   err("Hotplug slot not found for 
subfunction of PCI device\n");
-   return -ENODEV;
-   }
-   pci_dev_put(dev);
-   } else
-   dbg("No device in slot found\n");
+   for (func = 7; func >= 0; func--) {
+   dev = pci_get_slot(dslot->dev->bus, dslot->dev->devfn + func);
+   if (!dev)
+   continue;
+
+   /* don't disable bridged devices just yet, we can't handle
+* them easily... */
+   if (dev->subordinate) {
+   err("Can't remove PCI devices with other PCI devices 
behind it yet.\n");
+   return -ENODEV;
}
-   }
 
-   /* remove the device from the pci core */
-   pci_remove_bus_device(dslot->dev);
 
-   /* blow away this sysfs entry and other parts. */
-   remove_slot(dslot);
+   /* remove the device from the pci core */
+

[PATCH 1/4, v3] PCI Hotplug: Remove path attribute from sgi_hotplug

2007-11-17 Thread Alex Chiang

Rename the slot to be the contents of the 'path' sysfs attribute, and
delete the attribute.  The mapping from pci address to slot name is
supposed to be done through the 'address' file, which will be provided
automatically later in this series of patches.

Signed-off-by: Alex Chiang <[EMAIL PROTECTED]>
Signed-off-by: Matthew Wilcox <[EMAIL PROTECTED]>
---
 drivers/pci/hotplug/sgi_hotplug.c |   32 +---
 1 files changed, 1 insertions(+), 31 deletions(-)

diff --git a/drivers/pci/hotplug/sgi_hotplug.c 
b/drivers/pci/hotplug/sgi_hotplug.c
index ef07c36..693519e 100644
--- a/drivers/pci/hotplug/sgi_hotplug.c
+++ b/drivers/pci/hotplug/sgi_hotplug.c
@@ -91,21 +91,6 @@ static struct hotplug_slot_ops sn_hotplug_slot_ops = {
 
 static DEFINE_MUTEX(sn_hotplug_mutex);
 
-static ssize_t path_show (struct hotplug_slot *bss_hotplug_slot,
- char *buf)
-{
-   int retval = -ENOENT;
-   struct slot *slot = bss_hotplug_slot->private;
-
-   if (!slot)
-   return retval;
-
-   retval = sprintf (buf, "%s\n", slot->physical_path);
-   return retval;
-}
-
-static struct hotplug_slot_attribute sn_slot_path_attr = __ATTR_RO(path);
-
 static int sn_pci_slot_valid(struct pci_bus *pci_bus, int device)
 {
struct pcibus_info *pcibus_info;
@@ -173,18 +158,10 @@ static int sn_hp_slot_private_alloc(struct hotplug_slot 
*bss_hotplug_slot,
return -ENOMEM;
bss_hotplug_slot->private = slot;
 
-   bss_hotplug_slot->name = kmalloc(SN_SLOT_NAME_SIZE, GFP_KERNEL);
-   if (!bss_hotplug_slot->name) {
-   kfree(bss_hotplug_slot->private);
-   return -ENOMEM;
-   }
+   bss_hotplug_slot->name = slot->physical_path;
 
slot->device_num = device;
slot->pci_bus = pci_bus;
-   sprintf(bss_hotplug_slot->name, "%04x:%02x:%02x",
-   pci_domain_nr(pci_bus),
-   ((u16)pcibus_info->pbi_buscommon.bs_persist_busnum),
-   device + 1);
 
sn_generate_path(pci_bus, slot->physical_path);
 
@@ -203,8 +180,6 @@ static struct hotplug_slot * sn_hp_destroy(void)
bss_hotplug_slot = slot->hotplug_slot;
list_del(&((struct slot *)bss_hotplug_slot->private)->
 hp_list);
-   sysfs_remove_file(_hotplug_slot->kobj,
- _slot_path_attr.attr);
break;
}
return bss_hotplug_slot;
@@ -653,11 +628,6 @@ static int sn_hotplug_slot_register(struct pci_bus 
*pci_bus)
rc = pci_hp_register(bss_hotplug_slot);
if (rc)
goto register_err;
-
-   rc = sysfs_create_file(_hotplug_slot->kobj,
-  _slot_path_attr.attr);
-   if (rc)
-   goto register_err;
}
dev_dbg(_bus->self->dev, "Registered bus with hotplug\n");
return rc;
-- 
1.5.3.1.1.g1e61

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] do_task_stat: don't use task_pid_nr_ns() lockless

2007-11-17 Thread Oleg Nesterov

Without rcu/tasklist/siglock lock task_pid_nr_ns() may read the freed memory,
move the callsite under ->siglock.

Sadly, we can report pid == 0 if the task was detached.

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

--- 24/fs/proc/array.c~dtst 2007-11-09 12:57:30.0 +0300
+++ 24/fs/proc/array.c  2007-11-17 21:26:55.0 +0300
@@ -392,7 +392,7 @@ static int do_task_stat(struct task_stru
sigset_t sigign, sigcatch;
char state;
int res;
-   pid_t ppid = 0, pgid = -1, sid = -1;
+   pid_t pid = 0, ppid = 0, pgid = -1, sid = -1;
int num_threads = 0;
struct mm_struct *mm;
unsigned long long start_time;
@@ -403,9 +403,6 @@ static int do_task_stat(struct task_stru
unsigned long rsslim = 0;
char tcomm[sizeof(task->comm)];
unsigned long flags;
-   struct pid_namespace *ns;
-
-   ns = current->nsproxy->pid_ns;
 
state = *get_task_state(task);
vsize = eip = esp = 0;
@@ -425,6 +422,7 @@ static int do_task_stat(struct task_stru
 
rcu_read_lock();
if (lock_task_sighand(task, )) {
+   struct pid_namespace *ns = current->nsproxy->pid_ns;
struct signal_struct *sig = task->signal;
 
if (sig->tty) {
@@ -461,6 +459,7 @@ static int do_task_stat(struct task_stru
gtime = cputime_add(gtime, sig->gtime);
}
 
+   pid = task_pid_nr_ns(task, ns);
sid = task_session_nr_ns(task, ns);
pgid = task_pgrp_nr_ns(task, ns);
ppid = task_ppid_nr_ns(task, ns);
@@ -495,7 +494,7 @@ static int do_task_stat(struct task_stru
res = sprintf(buffer, "%d (%s) %c %d %d %d %d %d %u %lu \
 %lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \
 %lu %lu %lu %lu %lu %lu %lu %lu %d %d %u %u %llu %lu %ld\n",
-   task_pid_nr_ns(task, ns),
+   pid,
tcomm,
state,
ppid,

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Soft lockups since stable kernel upgrade to 2.6.23.8

2007-11-17 Thread Javier Kohen

I upgraded today from 2.6.23 to 2.6.23.8 and started seeing a lot of
these in the logs:

BUG: soft lockup detected on CPU#0!
 [] update_process_times+0x32/0x54
 [] tick_sched_timer+0x5e/0x99
 [] hrtimer_interrupt+0x112/0x197
 [] tick_sched_timer+0x0/0x99
 [] smp_apic_timer_interrupt+0x60/0x6f
 [] acpi_hw_register_write+0x118/0x148
 [] apic_timer_interrupt+0x28/0x30
 [] acpi_safe_halt+0x14/0x20 [processor]
 [] acpi_processor_idle+0x134/0x387 [processor]
 [] cpu_idle+0x46/0x59
 [] start_kernel+0x23c/0x241
 [] unknown_bootoption+0x0/0x196
 ===
BUG: soft lockup detected on CPU#0!
 [] update_process_times+0x32/0x54
 [] tick_sched_timer+0x5e/0x99
 [] hrtimer_interrupt+0x112/0x197
 [] tick_sched_timer+0x0/0x99
 [] smp_apic_timer_interrupt+0x60/0x6f
 [] apic_timer_interrupt+0x28/0x30
 ===
BUG: soft lockup detected on CPU#0!
 [] update_process_times+0x32/0x54
 [] fill_window+0x29d/0x384
 [] tick_sched_timer+0x5e/0x99
 [] hrtimer_interrupt+0x112/0x197
 [] tick_sched_timer+0x0/0x99
 [] zlib_inflate_table+0x1d9/0x4c0
 [] zlib_inflate_table+0x1d9/0x4c0
 [] tick_do_broadcast+0x1f/0x3f
 [] tick_handle_oneshot_broadcast+0x47/0x7f
 [] timer_interrupt+0x1a/0x20
 [] handle_IRQ_event+0x1a/0x3f
 [] handle_edge_irq+0x8b/0xd7
 [] do_IRQ+0x53/0x6c
 [] tick_notify+0x161/0x220
 [] common_interrupt+0x23/0x28
 [] acpi_processor_idle+0x22c/0x387 [processor]
 [] cpu_idle+0x46/0x59
 [] start_kernel+0x23c/0x241
 [] unknown_bootoption+0x0/0x196

I'm getting them in the hundreds but I had never seen them before this
upgrade. CPU is a single CPU, single core AMD Turion running in 32-bit
mode. Apparently they only occur when the ondemand governor is used. I
switched to the powersave and the performance governors for a while and
didn't see any message, but as soon as I went back to ondemand, the
messages started showing up again.

I see the problem might have to do with timers. In case it's relevant,
the available clock sources are acpi_pm pit jiffies tsc, of which
acpi_pm is the current one in use. I'm including the kernel config as
well.

Please CC, since I'm not subscribed to this list.

Modules Loaded: nls_iso8859_1 nls_cp437 vfat fat radeon drm af_packet
binfmt_misc capability commoncap ipv6 iptable_mangle iptable_filter
ip_tables x_tables ext2 snd_seq_dummy snd_seq_oss snd_seq_midi
snd_rawmidi snd_seq_midi_event snd_seq snd_seq_device cpufreq_ondemand
cpufreq_conservative cpufreq_powersave powernow_k8 freq_table snd_atiixp
snd_atiixp_modem snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss
snd_pcm snd_timer battery ac snd yenta_socket rsrc_nonstatic pcmcia_core
tifm_7xx1 tifm_core button soundcore snd_page_alloc psmouse pcspkr evdev
k8temp hwmon rtc sha256 aes dm_crypt dm_mirror dm_snapshot dm_mod sg
sd_mod sr_mod cdrom 8139cp usb_storage ohci1394 pata_atiixp 8139too mii
bitrev crc32 ehci_hcd ieee1394 libata ohci_hcd usbcore thermal processor
fan


CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_KMOD=y
CONFIG_BLOCK=y
CONFIG_LSF=y

CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_CFQ=y
CONFIG_DEFAULT_IOSCHED="cfq"

CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_X86_PC=y
CONFIG_MK8=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_XADD=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_X86_MINIMUM_CPU_FAMILY=4
CONFIG_HPET_TIMER=y
CONFIG_PREEMPT_VOLUNTARY=y
CONFIG_X86_UP_APIC=y
CONFIG_X86_UP_IOAPIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_NONFATAL=y
CONFIG_VM86=y
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m

CONFIG_DMIID=y

[PATCH 0/4, v3] Physical PCI slot objects

2007-11-17 Thread Alex Chiang

Hi all,

This is v3 of the pci_slot patch series.

The major change is making the ACPI-PCI slot driver a Kconfig
option, as per the recommendations of others (Gary, Kenji-san).
In the process of doing so, it made sense to collapse the former
3/5 and 4/5 patches into a single 3/4 patch. There really wasn't
a reason to introduce a pci_slot patch, and then immediately
follow it with another patch modifying its interface; logically,
the changes should have been in the same patch.

Combining the patches also has the nice side benefit of keeping
the tree fully buildable and bisectable at all stages of series.

I have done quite a bit more testing, and verified that this
series plays nicely with acpiphp during all stages of the series.
Notably, you can modprobe/rmmod acpiphp repeatedly no matter
where you are in the series, and no matter whether you have
CONFIG_ACPI_PCI_SLOT turned on. The correct entries in
/sys/bus/pci/slots/ will appear and disappear, and we correctly
register/deregister ACPI slots with the pci_hp core. 

Of course, if you *do* have the ACPI-PCI slot driver configured,
the slots/ entries in sysfs will be persistent. What you will see
is the hotplug attributes appear/disappear, depending on whether
you have acpiphp loaded or not.

Thanks for your consideration and all the feedback comments.
They're appreciated.

/ac

v2 -> v3:
  Patch 1/4 - no change
  Patch 2/4 - incorporate Eike's comments around snprintf
  Patch 3/4 - Separated slot creation and slot hotplug ability
  into two interfaces. Fixed bugs in pci_destroy_slot(), 
  and now properly calling from pci_hp_deregister.
  Patch 4/4 - Add Kconfig option to driver, allowing users to
  [de]config this driver. If configured, take slightly 
  different code paths in pci_hp_register and pci_hp_deregister.

v1 -> v2:
  Patch 1/5 - reworked to fix stupid compile bug
  Patch 2/5 - incorporate Eike, Linas, and Willy's comments
  Patch 3/5 - no change
  Patch 4/5 - was acpi-pci-slot-driver patch, now modifies
  pci_add_hotplug(). I changed the ordering on this so
  the tree doesn't break at this point in the series
  Patch 5/5 - now is acpi-pci-slot-driver patch, cleaned up
  implementation so our slot detection is a little
  better
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] Printk kernel version in WARN_ON

2007-11-17 Thread Andrew Morton

On Sat, 17 Nov 2007 10:15:52 -0800 Arjan van de Ven <[EMAIL PROTECTED]> wrote:

> @@ -35,8 +36,8 @@ struct bug_entry {
>  #define WARN_ON(condition) ({
> \
>   int __ret_warn_on = !!(condition);  \
>   if (unlikely(__ret_warn_on)) {  \
> - printk("WARNING: at %s:%d %s()\n", __FILE__,\
> - __LINE__, __FUNCTION__);\
> + printk("WARNING: at %s:%d %s()  (%s)\n", __FILE__,  \
> + __LINE__, __FUNCTION__, UTS_RELEASE);   \
>   dump_stack();   \
>   }   \
>   unlikely(__ret_warn_on);\

that made our 1100-odd WARN_ON sites fatter.

I suppose sometime we should optimise WARN_ON like we did BUG_ON.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-17 Thread Andrew Morton

On Sat, 17 Nov 2007 19:09:46 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote:

> 
> * Torsten Kaiser <[EMAIL PROTECTED]> wrote:
> 
> > Sadly lockdep does not work for me, as it gets turned off early:
> > [   39.851594] -
> > [   39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> > [   39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
> > [   39.866963]  (>list_lock){-+..}, at: []
> 
> hey, that means it found a bug - which is not sad at all :-)
> 

mutter.

Torsten, you could try CONFIG_SLAB=y, CONFIG_SLUB=n to see if you can make
some progress on the NFS problem.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] Printk kernel version in WARN_ON

2007-11-17 Thread Arjan van de Ven

Hi,

today, all oopses contain a version number of the kernel, which is nice
because the people who actually do bother to read the oops get this
vital bit of information always without having to ask the reporter in
another round trip.

However, WARN_ON() right now lacks this information; the patch below
adds this. This information is essential for getting people to use
their time effectively when looking at these things; in addition, it's
essential for tools that try to collect statistics about defects.

Please consider, maybe even for 2.6.24 since its so simple and
important for long term quality

Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>

--- linux-2.6.24-rc3/include/asm-generic/bug.h.org  2007-11-17 
09:55:00.0 -0800
+++ linux-2.6.24-rc3/include/asm-generic/bug.h  2007-11-17 10:11:23.0 
-0800
@@ -2,6 +2,7 @@
 #define _ASM_GENERIC_BUG_H
 
 #include 
+#include 
 
 #ifdef CONFIG_BUG
 
@@ -35,8 +36,8 @@ struct bug_entry {
 #define WARN_ON(condition) ({  \
int __ret_warn_on = !!(condition);  \
if (unlikely(__ret_warn_on)) {  \
-   printk("WARNING: at %s:%d %s()\n", __FILE__,\
-   __LINE__, __FUNCTION__);\
+   printk("WARNING: at %s:%d %s()  (%s)\n", __FILE__,  \
+   __LINE__, __FUNCTION__, UTS_RELEASE);   \
dump_stack();   \
}   \
unlikely(__ret_warn_on);\


-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

2007-11-17 Thread Daniel Walker

On Sat, 2007-11-17 at 19:04 +0100, Ingo Molnar wrote:

> 
> split the list with you? Feel free to take any of those :-) dev->sem is 
> nontrivial and probably not possible right now - and some of the others 
> might be problematic too. But there might be fixable ones in the list. 
> This shouldnt become like the BKL conversion - never truly finished.

If I said I was going to do some set, at least it's more likely that the
work isn't duplicated..

What specifically is wrong with dev->sem ?

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RFC] kprobes: Add user entry-handler in kretprobes

2007-11-17 Thread Abhishek Sagar

On Nov 17, 2007 6:24 AM, Jim Keniston <[EMAIL PROTECTED]> wrote:
> It'd be helpful to see others (especially kprobes maintainers) chime in
> on this.  In particular, if doing kmalloc/kfree of GFP_ATOMIC data at
> kretprobe-hit time is OK, as in Abhishek's approach, then we could also
> use GFP_ATOMIC (or at least GFP_NOWAIT) allocations to make up the
> difference when we run low on kretprobe_instances.

It might cause a problem with return instances having a large value of
entry_info_sz, being allocated in the page frame reclamation code
path.

> > > entry_info is, by default, a zero-length array, which adds nothing to
> > > the size of a uretprobe_instance -- at least on the 3 architectures I've
> > > tested on (i386, x86_64, powerpc).
> >
> > Strange, because from what I could gather, the data pouch patch had
> > the following in the kretprobe registration routine:
> >
> >
> > for (i = 0; i < rp->maxactive; i++) {
> > - inst = kmalloc(sizeof(struct kretprobe_instance), GFP_KERNEL);
> > + inst = kmalloc((sizeof(struct kretprobe_instance)
> > + + rp->entry_info_sz), GFP_KERNEL);
> >
> >
> > rp->entry_info_sz is presumably the size of the private data structure
> > of the registering module.
>
> ... which is zero for kretprobes that don't use the data pouch.
>
> > This is the bloat I was referring to. But
> > this difference should've showed up in your tests...?
>
> What bloat?  On my 32-bit system, the pouch to hold struct prof_data in
> your test_module example would be 20 bytes.  (For comparison,
> sizeof(struct kretprobe_instance) = 28, btw.)  Except for functions like
> schedule(), where a lot of tasks can be sleeping at the same time, an
> rp->maxactive value of 5 or 10 is typically plenty.  That's 100-200
> bytes of "bloat" spent at registration time (GFP_KERNEL), at least some
> of which will be saved at probe-hit time (GFP_ATOMIC).  (And if somebody
> says, "I always use a much higher value of rp->maxactive," then he/she's
> probably not really worried about bloat.)

Ok. Will make the necessary transition to registration time allocation
of private data.

> Yes.  If the pouch idea is too weird, then the data pointer is a good
> compromise.
>
> With the above reservations, your enclosed patch looks OK.
>
> You should provide a patch #2 that updates Documentation/kprobes.txt.
> Maybe that will yield a little more review from other folks.

Will incorporate changes to kprobes.txt as well.

- Abhishek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] task_pid_nr_ns() breaks proc_pid_readdir()

2007-11-17 Thread Oleg Nesterov

proc_pid_readdir:

for (...; ...; task = next_tgid(tgid + 1, ns)) {
tgid = task_pid_nr_ns(task, ns);
... use tgid ...

The first problem is that task_pid_nr_ns() can race with RCU and read the
freed memory.

However, rcu_read_lock() can't help. next_tgid() returns a pinned task_struct,
but the task can be released (and it's pid detached) before task_pid_nr_ns()
reads the pid_t value. In that case task_pid_nr_ns() returns 0 thus breaking
the whole logic.

Make sure that task_pid_nr_ns() returns !0 before updating tgid. Note that
next_tgid(tgid + 1) can find the same "struct pid" again, but we shouldn't
go into the endless loop because pid_task(PIDTYPE_PID) must return NULL in
this case, so next_tgid() can't return the same task.

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

--- 24/fs/proc/base.c~pprd  2007-10-25 16:22:11.0 +0400
+++ 24/fs/proc/base.c   2007-11-17 20:58:14.0 +0300
@@ -2481,7 +2481,15 @@ int proc_pid_readdir(struct file * filp,
for (task = next_tgid(tgid, ns);
 task;
 put_task_struct(task), task = next_tgid(tgid + 1, ns)) {
-   tgid = task_pid_nr_ns(task, ns);
+   int nr;
+
+   rcu_read_lock();
+   nr = task_pid_nr_ns(task, ns);
+   rcu_read_unlock();
+   if (!nr)
+   continue;
+
+   tgid = nr;
filp->f_pos = tgid + TGID_OFFSET;
if (proc_pid_fill_cache(filp, dirent, filldir, task, tgid) < 0) 
{
put_task_struct(task);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-17 Thread Ingo Molnar


* Torsten Kaiser <[EMAIL PROTECTED]> wrote:

> Sadly lockdep does not work for me, as it gets turned off early:
> [   39.851594] -
> [   39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> [   39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
> [   39.866963]  (>list_lock){-+..}, at: []

hey, that means it found a bug - which is not sad at all :-)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-17 Thread Andrew Morton

On Sat, 17 Nov 2007 18:53:45 +0100 "Torsten Kaiser" <[EMAIL PROTECTED]> wrote:

> On Nov 16, 2007 3:15 PM, Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
> > Hi Andrew,
> >
> > The kernel enters the xmon state while running the file system
> > stress on nfs v4 mounted partition.
> [snip]
> > 0:mon> t
> > [c000dbd4fb50] c0069768 .__wake_up+0x54/0x88
> > [c000dbd4fc00] d086b890 .nfs_sb_deactive+0x44/0x58 [nfs]
> > [c000dbd4fc80] d0872658 .nfs_free_unlinkdata+0x2c/0x74 [nfs]
> > [c000dbd4fd10] d0598510 .rpc_release_calldata+0x50/0x74 [sunrpc]
> > [c000dbd4fda0] c008d960 .run_workqueue+0x10c/0x1f4
> > [c000dbd4fe50] c008ec70 .worker_thread+0x118/0x138
> > [c000dbd4ff00] c00939f4 .kthread+0x78/0xc4
> > [c000dbd4ff90] c002b060 .kernel_thread+0x4c/0x68
> 
> Definitely not a ppc problem.
> Got nearly the same backtrace on 64bit x86:
> [  966.712167] BUG: soft lockup - CPU#3 stuck for 11s! [rpciod/3:605]
> [  966.718522] CPU 3:
> [  966.720589] Modules linked in: radeon drm nfsd exportfs ipv6
> w83792d tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx
> tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
> videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common
> v4l1_compat hid sg i2c_nforce2 pata_amd
> [  966.748306] Pid: 605, comm: rpciod/3 Not tainted 2.6.24-rc2-mm1 #4
> [  966.754653] RIP: 0010:[]  []
> _spin_lock_irqsave+0x12/0x30
> [  966.763424] RSP: 0018:81007ef33e28  EFLAGS: 0286
> [  966.768879] RAX: 0286 RBX: 81007ef33e60 RCX: 
> 
> [  966.776204] RDX: 0001 RSI: 0003 RDI: 
> 81011e107960
> [  966.783511] RBP: 81011cc6c588 R08: 8100db918130 R09: 
> 81011cc6c540
> [  966.790837] R10:  R11: 80266390 R12: 
> 8100d2d693a8
> [  966.798170] R13: 81011cc6c588 R14: 8100d2d693a8 R15: 
> 80302726
> [  966.805505] FS:  7f9e739d96f0() GS:81011ff12700()
> knlGS:
> [  966.813805] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
> [  966.819703] CR2: 01b691d0 CR3: 69861000 CR4: 
> 06e0
> [  966.827039] DR0:  DR1:  DR2: 
> 
> [  966.834362] DR3:  DR6: 0ff0 DR7: 
> 0400
> [  966.841687]
> [  966.841687] Call Trace:
> [  966.845728]  [] __wake_up+0x2d/0x70
> [  966.850900]  [] nfs_free_unlinkdata+0x1e/0x50
> [  966.857004]  [] rpc_release_calldata+0x26/0x50
> [  966.863161]  [] rpc_async_schedule+0x0/0x10
> [  966.869078]  [] run_workqueue+0xcc/0x170
> [  966.874705]  [] worker_thread+0x0/0xb0
> [  966.880163]  [] worker_thread+0x0/0xb0
> [  966.885610]  [] worker_thread+0x6d/0xb0
> [  966.891148]  [] autoremove_wake_function+0x0/0x30
> [  966.897606]  [] worker_thread+0x0/0xb0
> [  966.903045]  [] worker_thread+0x0/0xb0
> [  966.908485]  [] kthread+0x4b/0x80
> [  966.913484]  [] child_rip+0xa/0x12
> [  966.918579]  [] kthread+0x0/0x80
> [  966.923498]  [] child_rip+0x0/0x12
> [  966.928584]

I don't know what'a causing that.  I spose I should set up nfs4.

> Sadly lockdep does not work for me, as it gets turned off early:
> [   39.851594] -
> [   39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> [   39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
> [   39.866963]  (>list_lock){-+..}, at: []
> add_partial+0x31/0xa0
> [   39.874712] {softirq-on-W} state was registered at:
> [   39.879788]   [] __lock_acquire+0x3e8/0x1140
> [   39.885763]   [] debug_check_no_locks_freed+0x188/0x1a0
> [   39.892682]   [] lock_acquire+0x55/0x70
> [   39.898840]   [] add_partial+0x31/0xa0
> [   39.904288]   [] _spin_lock+0x1e/0x30
> [   39.909650]   [] add_partial+0x31/0xa0
> [   39.915097]   [] kmem_cache_open+0x1cc/0x330
> [   39.921066]   [] _spin_unlock_irq+0x24/0x30
> [   39.926946]   [] create_kmalloc_cache+0x64/0xf0
> [   39.933172]   [] init_alloc_cpu_cpu+0x70/0x90
> [   39.939226]   [] kmem_cache_init+0x65/0x1d0
> [   39.945289]   [] start_kernel+0x23e/0x350
> [   39.950996]   [] _sinittext+0x12d/0x140
> [   39.956529]   [] 0x
> [   39.961720] irq event stamp: 1207
> [   39.965048] hardirqs last  enabled at (1206): []
> debug_check_no_locks_freed+0x188/0x1a0
> [   39.974701] hardirqs last disabled at (1207): []
> __slab_free+0x3b/0x190
> [   39.982968] softirqs last  enabled at (570): []
> call_softirq+0x1c/0x30
> [   39.991148] softirqs last disabled at (1197): []
> call_softirq+0x1c/0x30
> [   39.999415]
> [   39.999416] other info that might help us debug this:
> [   40.005990] no locks held by swapper/0.
> [   40.010018]
> [   40.010018] stack backtrace:
> [   40.014429]
> [   40.014429] Call Trace:
> [   40.018407][] print_usage_bug+0x18c/0x1a0
> [   40.024817]  [] mark_lock+0x64c/0x660
> [   40.030057]  [] __lock_acquire+0x39e/0x1140
> [   40.035818]  []

Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

2007-11-17 Thread Ingo Molnar


* Daniel Walker <[EMAIL PROTECTED]> wrote:

> On Sat, 2007-11-17 at 18:46 +0100, Ingo Molnar wrote:
> > * Daniel Walker <[EMAIL PROTECTED]> wrote:
> > 
> > > > Actually, IMO, compat_semaphores behave like semaphores should 
> > > > behave, and thus the same as they behave on a non-RT kernel, and at 
> > > > the locations where the semaphores are now misused as mutexes on RT, 
> > > > we should replace them by differently-named-mutex-type-semaphores, 
> > > > or better: real-RT-mutexes..
> > > 
> > > The vast majority of semaphore are actually binary semaphores in the 
> > > Linux kernel .. So it's easier to mass convert semaphores to mutexes, 
> > > then address the ones that don't conform.. Usually they are converted 
> > > to the complete API in mainline..
> > 
> > right now there are 3992 mutex_lock() critical sections in the kernel 
> > and only 351 down() based critical sections are left.
> > 
> > fixing the top 20:
> > 
> >   4 _bus_priv.probe_mutex
> >   5 _lock
> >   5 _ptr->setting_up_sema
> >   5 >sem
> >   5 _res_mutex
> >   5 >port_lock
> >   5 _init_sem
> >   6 _handler_sem
> >   6 >parent->sem
> >   6 _lock
> >   6 >vport_sem
> >   7 _buffer_sem
> >   8 _f->sem
> >   9 >alloc_sem
> >  11 >sem
> >  11 >lock
> >  12 >erase_free_sem
> >  15 >scheduler_lock
> >  16 _data.config_sema
> >  17 >sem
> > 
> > would remove 164 of them, so it would convert half of the remaining 
> > semaphore use in the kernel. So the job is almost finished - would 
> > anyone like to go for the final grand feat: complete removal of 
> > semaphores from the kernel? :-)
> 
> Sure, you want to split the list?

split the list with you? Feel free to take any of those :-) dev->sem is 
nontrivial and probably not possible right now - and some of the others 
might be problematic too. But there might be fixable ones in the list. 
This shouldnt become like the BKL conversion - never truly finished.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[REQUEST] New boot flag/kernel option

2007-11-17 Thread Raymano Garibaldi

I would like to request a new boot flag/kernel option that would make
the following scenario possible:

1) Working on laptop with a live USB distro on a read-only USB stick.
2) Suspend laptop.
3) Detach USB stick.

4) Do other things, get on a plane, go on a bus, deal with police
officer giving you a ticket for operating a laptop while driving...

5) Attach the same read-only USB stick.
6) Resume laptop.
7) Continue work as if nothing happened.

The last time we were able to do something like this was in 2.6.21.

If not, could you please advise a workaround to get this functionality
with the latest kernels.

Thank you very much,
Raymano
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

2007-11-17 Thread Daniel Walker

On Sat, 2007-11-17 at 18:46 +0100, Ingo Molnar wrote:
> * Daniel Walker <[EMAIL PROTECTED]> wrote:
> 
> > > Actually, IMO, compat_semaphores behave like semaphores should 
> > > behave, and thus the same as they behave on a non-RT kernel, and at 
> > > the locations where the semaphores are now misused as mutexes on RT, 
> > > we should replace them by differently-named-mutex-type-semaphores, 
> > > or better: real-RT-mutexes..
> > 
> > The vast majority of semaphore are actually binary semaphores in the 
> > Linux kernel .. So it's easier to mass convert semaphores to mutexes, 
> > then address the ones that don't conform.. Usually they are converted 
> > to the complete API in mainline..
> 
> right now there are 3992 mutex_lock() critical sections in the kernel 
> and only 351 down() based critical sections are left.
> 
> fixing the top 20:
> 
>   4 _bus_priv.probe_mutex
>   5 _lock
>   5 _ptr->setting_up_sema
>   5 >sem
>   5 _res_mutex
>   5 >port_lock
>   5 _init_sem
>   6 _handler_sem
>   6 >parent->sem
>   6 _lock
>   6 >vport_sem
>   7 _buffer_sem
>   8 _f->sem
>   9 >alloc_sem
>  11 >sem
>  11 >lock
>  12 >erase_free_sem
>  15 >scheduler_lock
>  16 _data.config_sema
>  17 >sem
> 
> would remove 164 of them, so it would convert half of the remaining 
> semaphore use in the kernel. So the job is almost finished - would 
> anyone like to go for the final grand feat: complete removal of 
> semaphores from the kernel? :-)

Sure, you want to split the list?

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Is it possible to give the user the option to cancel forkbombs?

2007-11-17 Thread Dane Mutters


On Sat, 2007-11-17 at 16:53 +0100, Diego Calleja wrote:
> El Sat, 17 Nov 2007 09:42:51 -0800, Martin Olsson <[EMAIL PROTECTED]> 
> escribió:
> 
> > I don't think that setting a max process count by default is a 
> > good/viable solution. 
> 
> 
> I don't see why...OS X had a default limit of 100 processes per uid (increased
> to 266 in 10.5) and "it works" (many people notices it, but it's not 
> surprising
> since the limit is too restrictive).
> 
> If you don't have limits, you can't avoid starvation easily. From my 
> experience,
> since I use CFS, fork/compile bombs (forgetting to put a number after make 
> -j...)
> are very sluggish mainly because the whole graphic subsystem is paged out.

I don't know if this is at all feasible, but is it possible to have a
mechanism that would detect a fork bomb in progress and either stop the
fork, or allow the user to cancel the operation?  For example, are there
any legitimate processes (i.e. ones that really need to fork like crazy)
that would need to generate 200+ processes in less than 1 second?

(Note: I'm not a programmer; I'm just throwing out the idea.)

-Dane

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-17 Thread Torsten Kaiser

On Nov 16, 2007 3:15 PM, Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
> Hi Andrew,
>
> The kernel enters the xmon state while running the file system
> stress on nfs v4 mounted partition.
[snip]
> 0:mon> t
> [c000dbd4fb50] c0069768 .__wake_up+0x54/0x88
> [c000dbd4fc00] d086b890 .nfs_sb_deactive+0x44/0x58 [nfs]
> [c000dbd4fc80] d0872658 .nfs_free_unlinkdata+0x2c/0x74 [nfs]
> [c000dbd4fd10] d0598510 .rpc_release_calldata+0x50/0x74 [sunrpc]
> [c000dbd4fda0] c008d960 .run_workqueue+0x10c/0x1f4
> [c000dbd4fe50] c008ec70 .worker_thread+0x118/0x138
> [c000dbd4ff00] c00939f4 .kthread+0x78/0xc4
> [c000dbd4ff90] c002b060 .kernel_thread+0x4c/0x68

Definitely not a ppc problem.
Got nearly the same backtrace on 64bit x86:
[  966.712167] BUG: soft lockup - CPU#3 stuck for 11s! [rpciod/3:605]
[  966.718522] CPU 3:
[  966.720589] Modules linked in: radeon drm nfsd exportfs ipv6
w83792d tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx
tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common
v4l1_compat hid sg i2c_nforce2 pata_amd
[  966.748306] Pid: 605, comm: rpciod/3 Not tainted 2.6.24-rc2-mm1 #4
[  966.754653] RIP: 0010:[]  []
_spin_lock_irqsave+0x12/0x30
[  966.763424] RSP: 0018:81007ef33e28  EFLAGS: 0286
[  966.768879] RAX: 0286 RBX: 81007ef33e60 RCX: 
[  966.776204] RDX: 0001 RSI: 0003 RDI: 81011e107960
[  966.783511] RBP: 81011cc6c588 R08: 8100db918130 R09: 81011cc6c540
[  966.790837] R10:  R11: 80266390 R12: 8100d2d693a8
[  966.798170] R13: 81011cc6c588 R14: 8100d2d693a8 R15: 80302726
[  966.805505] FS:  7f9e739d96f0() GS:81011ff12700()
knlGS:
[  966.813805] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
[  966.819703] CR2: 01b691d0 CR3: 69861000 CR4: 06e0
[  966.827039] DR0:  DR1:  DR2: 
[  966.834362] DR3:  DR6: 0ff0 DR7: 0400
[  966.841687]
[  966.841687] Call Trace:
[  966.845728]  [] __wake_up+0x2d/0x70
[  966.850900]  [] nfs_free_unlinkdata+0x1e/0x50
[  966.857004]  [] rpc_release_calldata+0x26/0x50
[  966.863161]  [] rpc_async_schedule+0x0/0x10
[  966.869078]  [] run_workqueue+0xcc/0x170
[  966.874705]  [] worker_thread+0x0/0xb0
[  966.880163]  [] worker_thread+0x0/0xb0
[  966.885610]  [] worker_thread+0x6d/0xb0
[  966.891148]  [] autoremove_wake_function+0x0/0x30
[  966.897606]  [] worker_thread+0x0/0xb0
[  966.903045]  [] worker_thread+0x0/0xb0
[  966.908485]  [] kthread+0x4b/0x80
[  966.913484]  [] child_rip+0xa/0x12
[  966.918579]  [] kthread+0x0/0x80
[  966.923498]  [] child_rip+0x0/0x12
[  966.928584]

Sadly lockdep does not work for me, as it gets turned off early:
[   39.851594] -
[   39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
[   39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
[   39.866963]  (>list_lock){-+..}, at: []
add_partial+0x31/0xa0
[   39.874712] {softirq-on-W} state was registered at:
[   39.879788]   [] __lock_acquire+0x3e8/0x1140
[   39.885763]   [] debug_check_no_locks_freed+0x188/0x1a0
[   39.892682]   [] lock_acquire+0x55/0x70
[   39.898840]   [] add_partial+0x31/0xa0
[   39.904288]   [] _spin_lock+0x1e/0x30
[   39.909650]   [] add_partial+0x31/0xa0
[   39.915097]   [] kmem_cache_open+0x1cc/0x330
[   39.921066]   [] _spin_unlock_irq+0x24/0x30
[   39.926946]   [] create_kmalloc_cache+0x64/0xf0
[   39.933172]   [] init_alloc_cpu_cpu+0x70/0x90
[   39.939226]   [] kmem_cache_init+0x65/0x1d0
[   39.945289]   [] start_kernel+0x23e/0x350
[   39.950996]   [] _sinittext+0x12d/0x140
[   39.956529]   [] 0x
[   39.961720] irq event stamp: 1207
[   39.965048] hardirqs last  enabled at (1206): []
debug_check_no_locks_freed+0x188/0x1a0
[   39.974701] hardirqs last disabled at (1207): []
__slab_free+0x3b/0x190
[   39.982968] softirqs last  enabled at (570): []
call_softirq+0x1c/0x30
[   39.991148] softirqs last disabled at (1197): []
call_softirq+0x1c/0x30
[   39.999415]
[   39.999416] other info that might help us debug this:
[   40.005990] no locks held by swapper/0.
[   40.010018]
[   40.010018] stack backtrace:
[   40.014429]
[   40.014429] Call Trace:
[   40.018407][] print_usage_bug+0x18c/0x1a0
[   40.024817]  [] mark_lock+0x64c/0x660
[   40.030057]  [] __lock_acquire+0x39e/0x1140
[   40.035818]  [] save_trace+0x37/0xa0
[   40.040972]  [] __rcu_process_callbacks+0x8d/0x250
[   40.047335]  [] lock_acquire+0x55/0x70
[   40.052663]  [] add_partial+0x31/0xa0
[   40.057905]  [] trace_hardirqs_on+0x83/0x160
[   40.063750]  [] _spin_lock+0x1e/0x30
[   40.068905]  [] add_partial+0x31/0xa0
[   40.074311]  [] __slab_free+0x100/0x190
[   40.079724]  []

Re: [PATCH v3 17/17] (Avoid overload)

2007-11-17 Thread Gregory Haskins

>>> On Sat, Nov 17, 2007 at  1:33 AM, in message
<[EMAIL PROTECTED]>, Steven Rostedt <[EMAIL PROTECTED]>
wrote: 
 
> - if ((p->prio >= rq->rt.highest_prio)
> - && (p->nr_cpus_allowed > 1)) {
> + if (unlikely(rt_task(rq->curr))) {
>   int cpu = find_lowest_rq(p);
>  
>   return (cpu == -1) ? task_cpu(p) : cpu;

We probably should leave the "p->nr_cpus_allowed > 1" in the conditional since 
it doesn't make sense to look if the task can't go anywhere.

-Greg

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

2007-11-17 Thread Ingo Molnar


* Daniel Walker <[EMAIL PROTECTED]> wrote:

> > Actually, IMO, compat_semaphores behave like semaphores should 
> > behave, and thus the same as they behave on a non-RT kernel, and at 
> > the locations where the semaphores are now misused as mutexes on RT, 
> > we should replace them by differently-named-mutex-type-semaphores, 
> > or better: real-RT-mutexes..
> 
> The vast majority of semaphore are actually binary semaphores in the 
> Linux kernel .. So it's easier to mass convert semaphores to mutexes, 
> then address the ones that don't conform.. Usually they are converted 
> to the complete API in mainline..

right now there are 3992 mutex_lock() critical sections in the kernel 
and only 351 down() based critical sections are left.

fixing the top 20:

  4 _bus_priv.probe_mutex
  5 _lock
  5 _ptr->setting_up_sema
  5 >sem
  5 _res_mutex
  5 >port_lock
  5 _init_sem
  6 _handler_sem
  6 >parent->sem
  6 _lock
  6 >vport_sem
  7 _buffer_sem
  8 _f->sem
  9 >alloc_sem
 11 >sem
 11 >lock
 12 >erase_free_sem
 15 >scheduler_lock
 16 _data.config_sema
 17 >sem

would remove 164 of them, so it would convert half of the remaining 
semaphore use in the kernel. So the job is almost finished - would 
anyone like to go for the final grand feat: complete removal of 
semaphores from the kernel? :-)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 17/17] (Avoid overload)

2007-11-17 Thread Gregory Haskins

>>> On Sat, Nov 17, 2007 at  1:33 AM, in message
<[EMAIL PROTECTED]>, Steven Rostedt <[EMAIL PROTECTED]>
wrote: 

> Sorry!  I forgot to put in a prologue for this patch.
> 
> Here it is.
> 
> 
> 
> This patch changes the searching for a run queue by a waking RT task
> to try to pick another runqueue if the currently running task
> is an RT task.
> 
> The reason is that RT tasks behave different than normal
> tasks. Preempting a normal task to run a RT task to keep
> its cache hot is fine, because the preempted non-RT task
> may wait on that same runqueue to run again unless the
> migration thread comes along and pulls it off.
> 
> RT tasks behave differently. If one is preempted, it makes
> an active effort to continue to run. So by having a high
> priority task preempt a lower priority RT task, that lower
> RT task will then quickly try to run on another runqueue.
> This will cause that lower RT task to replace its nice
> hot cache (and TLB) with a completely cold one. This is
> for the hope that the new high priority RT task will keep
>  its cache hot.
> 
> Remeber that this high priority RT task was just woken up.
> So it may likely have been sleeping for several milliseconds,
> and will end up with a cold cache anyway. RT tasks run till
> they voluntarily stop, or are preempted by a higher priority
> task. This means that it is unlikely that the woken RT task
> will have a hot cache to wake up to. So pushing off a lower
> RT task is just killing its cache for no good reason.

You make some excellent points here.  Out of curiosity, have you tried a 
comparison to see if it helps?

-Greg


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 16/17] Fix schedstat handling

2007-11-17 Thread Gregory Haskins

>>> On Sat, Nov 17, 2007 at  1:21 AM, in message
<[EMAIL PROTECTED]>, Steven Rostedt <[EMAIL PROTECTED]>
wrote: 
> Gregory Haskins RT balancing broke sched domains.

Doh! (though you mean s/domains/stats ;)

> This is a fix to allow it to still work.
> 
> Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
> 
> ---
>  include/linux/sched.h   |3 ++-
>  kernel/sched.c  |   17 ++---
>  kernel/sched_fair.c |   19 ++-
>  kernel/sched_idletask.c |3 ++-
>  kernel/sched_rt.c   |3 ++-
>  5 files changed, 34 insertions(+), 11 deletions(-)
> 
> Index: linux-compile.git/kernel/sched.c
> ===
> --- linux-compile.git.orig/kernel/sched.c 2007-11-17 00:15:57.0 
> -0500
> +++ linux-compile.git/kernel/sched.c  2007-11-17 00:15:57.0 -0500
> @@ -1453,6 +1453,7 @@ static int try_to_wake_up(struct task_st
>   unsigned long flags;
>   long old_state;
>   struct rq *rq;
> + struct sched_domain *this_sd = NULL;
>  #ifdef CONFIG_SMP
>   int new_cpu;
>  #endif
> @@ -1476,10 +1477,20 @@ static int try_to_wake_up(struct task_st
>   schedstat_inc(rq, ttwu_count);
>   if (cpu == this_cpu)
>   schedstat_inc(rq, ttwu_local);
> - else
> - schedstat_inc(rq->sd, ttwu_wake_remote);
> + else {
> +#ifdef CONFIG_SCHEDSTATS
> + struct sched_domain *sd;
> + for_each_domain(this_cpu, sd) {
> + if (cpu_isset(cpu, sd->span)) {
> + schedstat_inc(sd, ttwu_wake_remote);
> + this_sd = sd;
> + break;
> + }
> + }
> +#endif /* CONFIG_SCHEDSTATES */
> + }
>  
> - new_cpu = p->sched_class->select_task_rq(p, sync);
> + new_cpu = p->sched_class->select_task_rq(p, this_sd, sync);

I like this optimization, but I am thinking that the location of the stat 
update is now no longer relevant.  It should potentially go *after* the 
select_task_rq() so that we pick the sched_domain of the actual wake target, 
not the historical affinity.  If that is accurate, I'm sure you can finagle 
this optimization to work in that scenario too, but it will take a little 
re-work.


>  
>   if (new_cpu != cpu) {
>   set_task_cpu(p, new_cpu);
> Index: linux-compile.git/include/linux/sched.h
> ===
> --- linux-compile.git.orig/include/linux/sched.h  2007-11-17 
> 00:15:57.0 -0500
> +++ linux-compile.git/include/linux/sched.h   2007-11-17 00:15:57.0 
> -0500
> @@ -823,7 +823,8 @@ struct sched_class {
>   void (*enqueue_task) (struct rq *rq, struct task_struct *p, int wakeup);
>   void (*dequeue_task) (struct rq *rq, struct task_struct *p, int sleep);
>   void (*yield_task) (struct rq *rq);
> - int  (*select_task_rq)(struct task_struct *p, int sync);
> + int  (*select_task_rq)(struct task_struct *p,
> +struct sched_domain *sd, int sync);
>  
>   void (*check_preempt_curr) (struct rq *rq, struct task_struct *p);
>  
> Index: linux-compile.git/kernel/sched_fair.c
> ===
> --- linux-compile.git.orig/kernel/sched_fair.c2007-11-17 
> 00:15:57.0 -0500
> +++ linux-compile.git/kernel/sched_fair.c 2007-11-17 00:43:44.0 
> -0500
> @@ -611,11 +611,12 @@ static inline int wake_idle(int cpu, str
>  #endif
>  
>  #ifdef CONFIG_SMP
> -static int select_task_rq_fair(struct task_struct *p, int sync)
> +static int select_task_rq_fair(struct task_struct *p,
> +struct sched_domain *this_sd, int sync)
>  {
>   int cpu, this_cpu;
>   struct rq *rq;
> - struct sched_domain *sd, *this_sd = NULL;
> + struct sched_domain *sd;
>   int new_cpu;
>  
>   cpu  = task_cpu(p);
> @@ -623,15 +624,23 @@ static int select_task_rq_fair(struct ta
>   this_cpu = smp_processor_id();
>   new_cpu  = cpu;
>  
> + if (cpu == this_cpu || unlikely(!cpu_isset(this_cpu, p->cpus_allowed)))
> + goto out_set_cpu;
> +
> +#ifndef CONFIG_SCHEDSTATS
> + /*
> +  * If SCHEDSTATS is configured, then this_sd would
> +  * have already been determined.
> +  */
>   for_each_domain(this_cpu, sd) {
>   if (cpu_isset(cpu, sd->span)) {
>   this_sd = sd;
>   break;
>   }
>   }
> -
> - if (unlikely(!cpu_isset(this_cpu, p->cpus_allowed)))
> - goto out_set_cpu;
> +#else
> + (void)sd; /* unused */
> +#endif /* CONFIG_SCHEDSTATS */
>  
>   /*
>* Check for affine wakeup and passive balancing possibilities.
> Index: linux-compile.git/kernel/sched_idletask.c
> ===
> ---

Re: [PATCH v3 10/17] Remove some CFS specific code from the wakeup path of RT tasks

2007-11-17 Thread Gregory Haskins

>>> On Sat, Nov 17, 2007 at  1:21 AM, in message
<[EMAIL PROTECTED]>, Steven Rostedt <[EMAIL PROTECTED]>
wrote: 

> -/*
> - * wake_idle() will wake a task on an idle cpu if task->cpu is
> - * not idle and an idle cpu is available.  The span of cpus to
> - * search starts with cpus closest then further out as needed,
> - * so we always favor a closer, idle cpu.
> - *
> - * Returns the CPU we should wake onto.
> - */
> -#if defined(ARCH_HAS_SCHED_WAKE_IDLE)
> -static int wake_idle(int cpu, struct task_struct *p)
> -{
> - cpumask_t tmp;
> - struct sched_domain *sd;
> - int i;
> -
> - /*
> -  * If it is idle, then it is the best cpu to run this task.
> -  *
> -  * This cpu is also the best, if it has more than one task already.
> -  * Siblings must be also busy(in most cases) as they didn't already
> -  * pickup the extra load from this cpu and hence we need not check
> -  * sibling runqueue info. This will avoid the checks and cache miss
> -  * penalities associated with that.
> -  */
> - if (idle_cpu(cpu) || cpu_rq(cpu)->nr_running > 1)
> - return cpu;
> -
> - for_each_domain(cpu, sd) {
> - if (sd->flags & SD_WAKE_IDLE) {
> - cpus_and(tmp, sd->span, p->cpus_allowed);
> - for_each_cpu_mask(i, tmp) {
> - if (idle_cpu(i)) {
> - if (i != task_cpu(p)) {
> - schedstat_inc(p,
> - se.nr_wakeups_idle);

  ^

[...]


> --- linux-compile.git.orig/kernel/sched_fair.c2007-11-16 
> 11:16:38.0 -0500
> +++ linux-compile.git/kernel/sched_fair.c 2007-11-16 22:23:39.0 
> -0500
> @@ -564,6 +564,137 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
>  }
>  
>  /*
> + * wake_idle() will wake a task on an idle cpu if task->cpu is
> + * not idle and an idle cpu is available.  The span of cpus to
> + * search starts with cpus closest then further out as needed,
> + * so we always favor a closer, idle cpu.
> + *
> + * Returns the CPU we should wake onto.
> + */
> +#if defined(ARCH_HAS_SCHED_WAKE_IDLE)
> +static int wake_idle(int cpu, struct task_struct *p)
> +{
> + cpumask_t tmp;
> + struct sched_domain *sd;
> + int i;
> +
> + /*
> +  * If it is idle, then it is the best cpu to run this task.
> +  *
> +  * This cpu is also the best, if it has more than one task already.
> +  * Siblings must be also busy(in most cases) as they didn't already
> +  * pickup the extra load from this cpu and hence we need not check
> +  * sibling runqueue info. This will avoid the checks and cache miss
> +  * penalities associated with that.
> +  */
> + if (idle_cpu(cpu) || cpu_rq(cpu)->nr_running > 1)
> + return cpu;
> +
> + for_each_domain(cpu, sd) {
> + if (sd->flags & SD_WAKE_IDLE) {
> + cpus_and(tmp, sd->span, p->cpus_allowed);
> + for_each_cpu_mask(i, tmp) {
> + if (idle_cpu(i))
> + return i;



Looks like some stuff that was added in 24 was inadvertently lost in the move 
when you merged the patches up from 23.1-rt11.  The attached patch is updated 
to move the new logic as well.

Regards,
-Greg


RT: Remove some CFS specific code from the wakeup path of RT tasks

From: Gregory Haskins <[EMAIL PROTECTED]>

The current wake-up code path tries to determine if it can optimize the
wake-up to "this_cpu" by computing load calculations.  The problem is that
these calculations are only relevant to CFS tasks where load is king.  For RT
tasks, priority is king.  So the load calculation is completely wasted
bandwidth.

Therefore, we create a new sched_class interface to help with
pre-wakeup routing decisions and move the load calculation as a function
of CFS task's class.

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 include/linux/sched.h   |1 
 kernel/sched.c  |  167 ---
 kernel/sched_fair.c |  148 ++
 kernel/sched_idletask.c |9 +++
 kernel/sched_rt.c   |   10 +++
 5 files changed, 195 insertions(+), 140 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e9e74de..253517b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -823,6 +823,7 @@ struct sched_class {
void (*enqueue_task) (struct rq *rq, struct task_struct *p, int wakeup);
void (*dequeue_task) (struct rq *rq, struct task_struct *p, int sleep);
void (*yield_task) (struct rq *rq);
+   int

Re: [patch/rfc 1/4] GPIO implementation framework

2007-11-17 Thread David Brownell

On Saturday 17 November 2007, Jean Delvare wrote:
> On Tue, 13 Nov 2007 20:36:13 -0800, David Brownell wrote:
> > On Tuesday 13 November 2007, eric miao wrote:
> > >   if (!requested)
> > > - printk(KERN_DEBUG "GPIO-%d autorequested\n",
> > > - chip->base + offset);
> > > + pr_debug("GPIO-%d autorequested\n", gpio);
> > 
> > Leave the printk in ... this is the sort of thing we want
> > to see fixed, which becomes unlikely once you hide such
> > diagnostics.  And for that matter, what would be enabling
> > the "-DDEBUG" that would trigger a pr_debug() message?
> 
> The original code isn't correct either.

It's perfectly correct.  That it's an idiom you don't
seem to *like* but is distinct from correctness.

> Either this is a debug message 
> and indeed pr_debug() should be used, or it's not and KERN_DEBUG should
> be replaced by a lower log level.

KERN_DEBUG is what says the message level is "debug".
Both styles log messages at that priority level.

Which is distinct from saying that the message should
vanish from non-debug builds ... that's what pr_debug
and friends do, by relying implicitly on "-DDEBUG".

In this case, the original code was saying that the
message should NOT just vanish.  One reason the patch
was incorrect was that even on its own terms, it was
wrong ... since it used the "-DDEBUG" mechanism wrong,
and prevented the message from *EVER* appearing.

- Dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

2007-11-17 Thread Daniel Walker

On Sat, 2007-11-17 at 18:09 +0100, Remy Bohmer wrote:

> Actually, IMO, compat_semaphores behave like semaphores should behave,
> and thus the same as they behave on a non-RT kernel, and at the
> locations where the semaphores are now misused as mutexes on RT, we
> should replace them by differently-named-mutex-type-semaphores, or
> better: real-RT-mutexes..

The vast majority of semaphore are actually binary semaphores in the
Linux kernel .. So it's easier to mass convert semaphores to mutexes,
then address the ones that don't conform.. Usually they are converted to
the complete API in mainline..

> IMO this wrong usage of semaphores is solved by modifying the code
> that actually made proper use of the semaphores, and I think that if
> the naming matches the mainline kernel, we only need to patch the
> files that really need to be patched during the integration in
> mainline of the RT-patch.

As I say above, it's happen already.. Code is slowly getting converted
to the complete API or the code gets converted to use a binary semaphore
(or a mutex)..

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] eradicate bashisms in scripts/patch-kernel

2007-11-17 Thread Sam Ravnborg

On Sat, Nov 17, 2007 at 05:33:27PM +0100, Andreas Mohr wrote:
> Hi,
> 
> On Wed, Nov 14, 2007 at 02:46:27PM -0800, Randy Dunlap wrote:
> > On Mon, 5 Nov 2007 20:58:27 +0100 Andreas Mohr wrote:
> > > Feel free to go ahead, otherwise I'll try another patch sometime soon.
> > > All I care about is that the result works on (at least)
> > > one shell implementation _more_ than the current status ;)
> > 
> > Hi Andreas,
> > 
> > Can you comment on (or test) whether this patch is sufficient
> > for your needs?  And if so, is the Signed-off-by: A.M. OK?
> 
> Sorry, no, using dash (0.5.3-5) there's still a remaining
> 
> $ linux-2.6.22/scripts/patch-kernel linux-2.6.22 /usr/src/patch-2.6
> Current kernel version is 2.6.22 ( Holy Dancing Manatees, Batman!)
> linux-2.6.22/scripts/patch-kernel: 207: Syntax error: Bad substitution
> 
> error in the
> 
> # strip EXTRAVERSION to just a number (drop leading '.' and trailing 
> additions)
> EXTRAVER=
> if [ x$EXTRAVERSION != "x" ]
> then
> >>> [l.207]   if [ ${EXTRAVERSION:0:1} == "." ]; then
> EXTRAVER=${EXTRAVERSION:1}
> else
> EXTRAVER=$EXTRAVERSION
> fi
> EXTRAVER=${EXTRAVER%%[[:punct:]]*}
> #echo "$PNAME: changing EXTRAVERSION from $EXTRAVERSION to $EXTRAVER"
> fi
> 
> part, which the sed expression (moderately successfully) tried to
> take care of.
> 
> The good part of the story is that the current corrected almost fully
> working version still works with bash (3.1dfsg-8), just like the
> original patch-kernel version.

Could you (or Randy) fix it up with the comment from Herbert
and submit the final version to me (with proper changelog and s-o-b).

Thanks,
Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NFS Bug in 2.6.23 ?

2007-11-17 Thread Gianluca Alberici


Hello,

I have found out something new about the cfs problem:

1) on a 2.6.20, working good:

after starting cfs, nothing cattach'd:

zeus:~# cat /proc/fs/nfsfs/volumes
NV SERVER   PORT DEV FSID
v2 7f01  be9 0:140:0

after cattaching (and at least once listing the directory, if not 
everything remains as above):


zeus:~# cat /proc/fs/nfsfs/volumes
NV SERVER   PORT DEV FSID
v2 7f01  be9 0:140:0
v2 7f01  be9 0:151:0

from now everything is OK forever

2) On 2.6.23  at cfsd restart

mars:~# cat /proc/fs/nfsfs/volumes
NV SERVER   PORT DEV FSID
v2 7f01  be9 0:140:0

after cattaching i get

mars:~# cat /proc/fs/nfsfs/volumes
NV SERVER   PORT DEV FSID
v2 7f01  be9 0:141:0
v2 7f01  be9 0:151:0

The first FSID has chnaged and is 1:0 as the second while on 2.6.20 this 
doesnt happens.
I know nothing about NFS internals but isnt strange to have the same 
FSID on 2 volumes ?


BTW, the 2 machines have the same disk image. No diffs but the kernel.

Maybe this means nothing, if so sorry for the noise.

Gianluca




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] eradicate bashisms in scripts/patch-kernel

2007-11-17 Thread Adrian Bunk

On Wed, Oct 31, 2007 at 10:13:21PM +0100, Andreas Mohr wrote:

> Hello,

Hi Andreas,

> I was non-mildly horrified to find that the rather widely used patch-kernel
> script seems to rely on bash despite specifying the interpreter as #!/bin/sh,
> since my dash-using Debian install choked on it.
> 
> Thus I'm delivering a first, preliminary, non-reviewed change to make
> patch-kernel (a little bit more?) POSIX-compatible. It now survives both
> a dash and a bash run.
>...
> Comments?
>...

if that's easyly possible it's OK.

But if it becomes possible I'd strongly favour simply changing the 
interpreter to #!/bin/bash which would also fix this problem.

> Thanks,
> 
> Andreas Mohr
>...

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: perfmon2 merge news

2007-11-17 Thread Patrick DEMICHEL

Yet another noisy linux HPC user

I hope to convince you, lkml developers, to pay more attention to our
HPC performance problems.

I will not try to convince you that our problems are also the problems
of many others users, I hope they will do it directly.

Imagine my company bought an expensive complex multi nodes, multi
sockets, multi cores machine.


This is cheap today, around 10M$
My company made the strange decision to go for linux, in fact we had
no choice : OOPS
This machine will be used to solve many fundamental problems like
meteorology, life, nanotechnologies, technologies, maths,
climatology, ...
Many of our scientists and developers will try to exploit the
potential of this machine to make some radically new sciences and make
breakthroughs in their domains. Some of those results could have a
major impact on everybody's life.

Then you see this is not just the problem of a bunch of desperate HPC users.

Moore's Law gave us the opportunity to solve many fundamental problems
by offering tons of cheap transistors,

but we all have 1 major problem : how to optimize our codes in that
context of massive parallelism?

Any idea what is massive?

Maybe you start to be familiar with tuning 4 cores.

We target shortly tuning millions of heterogeneous cores.

Good news for you, this is not the only problem we need to solve, but
this one is very serious.
And we know this is just an intermediate step towards somethings
continuously more complex and challenging.
There are tons of papers in the WEB written by many talented and
motivated people.
You need to be motivated to stay in this business :-)

Developing the complete software stack required to manage and use such
machines will require that a large number of different actors succeed
in going in the same direction and share the burden.
Nobody and no company can sustain all the required developments at
reasonable cost.
No company has the time and complete expertise to do it alone.
I hope collectively we can do it. This is not even sure as I can see today.

Following your logic, you can claim  "why such useless hardware
complexity? Do something simpler."
Here we have a problem, we cannot change the constants and laws of
physics, then we face the inevitable choice of massive parallelism,
complex memory hierarchies, complex micro architectures, complex
interconnects, variable elements, failing elements, ...
Quite some fun ahead in fact.

And I can promise you, the hardware designers are not lazy or short of
inspiration and they also have a growing infinitude of challenges.

Some people argue that some magic tools will decompose and tune the
programs automatically, then why you need performance tools in fact?
First this is will be done at the price of loosing an enormous part of
the potential, secondly the compilers will probably require extensive
support from the hardware counters to be somewhat effective. Most of
us target reasonable scalability, cannot afford to reach only 20% of
peak of anything.

A dream without some breakthrough on the tools side.

This is where we need advanced performance tools, tools that permit to
the largest amount of developers,
to understand how the architectures really work. Not how naively we
think they should work, but like they really work.

Theory and reality are not good friends, it's rare to meet them together.
We cannot afford that only some too rare specialists can do an always
partial tuning, I am sure they also have some limits at least time.
I am sure as soon the advanced tools will expose in the right form the
real problems to the developers, they will find innovative solutions.
Can you imagine a modern medicine without scanners ,radios, all the
sources of information on your body?

For us this is exactly the same thing, we desperately need advanced
performance tools, not one but many to attack the problems from
different angles.
An the tools should be easy to use, reliable, flexible, predictable,
ready to use when I need them, standard, installed everywhere and in
particular on the new platforms as soon they appear, ...
The tools need to hide the complexity when I need it, and expose it if required.
The tools will always be behind the requirements but I hope not too far.
I prefer tools that adapt to me than the opposite.


But I am realistic, I don't need perfect tools, I need tools I can
invest in learning and progressing a long time with them.

That's why also meanwhile the cost of development, many developers
ended up developing their own performance tools.
This is my case, spending more time to develop the tools I need than
using them, I have no choice today, but this is unsustainable now.
The current state of what is available is not what is required, not
even close to minimal of what I needed 5 years ago.


What Stephane is developing, is layer 1 of what we need, something
that hides most of the complexity of the hardware counters, and this
is not the fault of Stephane if this is very complex. This is not even

Re: sb live (emu10k1) stops working between 2.6.23.1 and 2.6.23.7

2007-11-17 Thread Jim Faulkner


I've done some more testing this morning, and it appears that the "ALSA:
emu10k1 - Fix memory corruption" patch from 2.6.23.6 has broken digital
output on my SB Live Value card.  Simply replacing the 2.6.23.7 emumixer.c
with the version included in 2.6.23.1 I was able to get digital output
working again under 2.6.23.7.

This does not appear to be a simple matter of adjusting the alsa mixer to
compsensate for how the mixer controls are exposed to userspace.  I have
duplicated all alsamixer settings between 2.6.23.1 and stock 2.6.23.7 in
the "F5" view all controls mode, but still received no audio output.  I
also tried adjusting other volume controls which normally do not need
adjusting, but got nowhere.  It appears that the 2.6.23.6 emu10k1 patch
broke digital output entirely on this card.

Just to add to the information in my first message, the SB Live card is
connected from the yellow jack on the card to the "Digital Coax Input"
connector on my external amplifier.

thanks,
Jim Faulkner


On Fri, 16 Nov 2007, Jim Faulkner wrote:

>
> Hello,
>
> I have an SB Live Value card which uses the emu10k1 driver.  I use digital
> output to an external amplifier.  This has worked fine for many years, up
> to and including kernel 2.6.23.1.  Under 2.6.23.7, I have been unable
> to get any audio output.  I get the following errors when loading my
> asound.state under 2.6.23.7 using `alsactl restore`:
> alsactl: set_control:991: warning: name mismatch (IEC958 Playback
> Mask/IEC958 Playback Default) for control #222
> alsactl: set_control:993: warning: index mismatch (3/0) for control #222
> alsactl: set_control:993: warning: index mismatch (0/1) for control #223
> alsactl: set_control:993: warning: index mismatch (1/2) for control #224
> alsactl: set_control:985: warning: iface mismatch (3/2) for control #225
> alsactl: set_control:987: warning: device mismatch (2/0) for control #225
> alsactl: set_control:989: warning: subdevice mismatch (0/0) for control
> #225
> alsactl: set_control:991: warning: name mismatch (IEC958 Playback
> Default/SB Live Analog/Digital Output Jack) for control #225
> alsactl: set_control:993: warning: index mismatch (2/0) for control #225
> alsactl: set_control:995: failed to obtain info for control #225
> (Operation not permitted)
>
> I receive no errors when I load my asound.state under 2.6.23.1.
>
> Under 2.6.23.7, I get the message "No digital data" on my external
> amplifier, which normally I only see when the computer is powered down or
> rebooting.  I do not get that message under 2.6.23.1, even when no audio
> is being played.
>
> I've tried copying down my alsamixer settings under 2.6.23.1, and
> re-entering them under 2.6.23.7, however I have still been unable to get
> any audio output.
>
> Here is the relevant part of my `lspci -v`:
> 05:0d.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev
> 08)
> Subsystem: Creative Labs CT4780 SBLive! Value
> Flags: bus master, medium devsel, latency 64, IRQ 23
> I/O ports at dce0 [size=32]
> Capabilities: [dc] Power Management version 2
>
> Please let me know if you need any more information, and please CC my
> e-mail address on any correspondence, since I am not subscribed to this
> list.
>
> thanks,
> Jim Faulkner
>
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RFC] kprobes: Add user entry-handler in kretprobes

2007-11-17 Thread Abhishek Sagar

On Nov 17, 2007 4:39 AM, Jim Keniston <[EMAIL PROTECTED]> wrote:
> First of all, as promised, here's what would be different if it were
> implemented using the data-pouch approach:
>
> --- abhishek1.c 2007-11-16 13:57:13.0 -0800
> +++ jim1.c  2007-11-16 14:20:39.0 -0800
> @@ -50,15 +50,12 @@
> if (stats)
> return 1; /* recursive/nested call */
>
> -   stats = kmalloc(sizeof(struct prof_data), GFP_ATOMIC);
> -   if (!stats)
> -   return 1;
> +   stats = (struct prof_data *) ri->entry_info;
>
> stats->entry_stamp = sched_clock();
> stats->task = current;
> INIT_LIST_HEAD(>list);
> list_add(>list, _nodes);
> -   ri->data = stats;
> return 0;
>  }
>
> @@ -66,10 +63,9 @@
>  static int return_handler(struct kretprobe_instance *ri, struct pt_regs
> *regs)
>  {
> unsigned long flags;
> -   struct prof_data *stats = (struct prof_data *)ri->data;
> +   struct prof_data *stats = (struct prof_data *)ri->entry_info;
> u64 elapsed;
>
> -   BUG_ON(ri->data == NULL);
> elapsed = (long long)sched_clock() - (long long)stats->entry_stamp;
>
> /* update stats */
> @@ -79,13 +75,13 @@
> spin_unlock_irqrestore(_lock, flags);
>
> list_del(>list);
> -   kfree(stats);
> return 0;
>  }
>
>  static struct kretprobe my_kretprobe = {
> .handler = return_handler,
> .entry_handler = entry_handler,
> +   .entry_info_sz = sizeof(struct prof_data)
>  };
>
>  /* called after every PRINT_DELAY seconds */
>
> So the data-pouch approach saves you a little code and a kmalloc/kfree
> round trip on each kretprobe hit.  A kmalloc/kfree round trip is about
> 80 ns on my system, or about 20% of the base cost of a kretprobe hit.  I
> don't know how important that is to people.
>
> I also note that this particular example maintains a per-task list of
> prof_data objects to avoid overcounting the time spent in a recursive
> function.  That adds about 30% to the size of your module source (136
> lines vs. 106, by my count).  I suspect that many instrumentation
> modules wouldn't need such a list.  However, without your ri->data
> pointer (or Kevin's ri->entry_info pouch), every instrumentation module
> that uses your enhancement would need such a list in order to map the ri
> to the per-instance.

Those are interesting numbers. Will incorporate pouching in the next
patch. Even with a data pointer or pouch, the mapping of ri (or
ri->data) would sometimes be necessary. It's required to catch
recursive/nested invocation cases. In case of time measurment test
module, these invocations needed to be weeded out and therefore such a
list was required. Other scenarios might not care for it. E.g a module
which measures the change in some global system state across every
call.

Thanks for the comments.

> Jim

- Abhishek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

2007-11-17 Thread Remy Bohmer

Hello Daniel,

Thanks for looking into it also.
Steven already made clear to me that the 'struct semaphore' type on
the RT-kernel should not be used as a counting-semaphore, but as some
sort of legacy-mutex... (The confusion that this will cause is clear
by now...)

I still do not understand the problems I had with the
interruptible-waits on a real rt-mutex, but I have to figure that out
again on Monday. Maybe one confusion let to another...

(Note, A completion will not work for me, because they are not
designed for reuse across several threads. The read/write runs in user
context and as such it can be called by different threads, which would
require a init of a completion before waiting on it, but that would be
racy, I could miss the awake by the init)

> So I converted your code to use a compat_semaphore, and no oops
> happens.. Which makes sense because compat_semaphores are designed to
> work the way your using them.

Actually, IMO, compat_semaphores behave like semaphores should behave,
and thus the same as they behave on a non-RT kernel, and at the
locations where the semaphores are now misused as mutexes on RT, we
should replace them by differently-named-mutex-type-semaphores, or
better: real-RT-mutexes..
IMO this wrong usage of semaphores is solved by modifying the code
that actually made proper use of the semaphores, and I think that if
the naming matches the mainline kernel, we only need to patch the
files that really need to be patched during the integration in
mainline of the RT-patch.

Kind Regards,

Remy Bohmer
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] eradicate bashisms in scripts/patch-kernel

2007-11-17 Thread Herbert Xu

On Sat, Nov 17, 2007 at 05:33:27PM +0100, Andreas Mohr wrote:
>
> # strip EXTRAVERSION to just a number (drop leading '.' and trailing 
> additions)
> EXTRAVER=
> if [ x$EXTRAVERSION != "x" ]
> then
> >>> [l.207]   if [ ${EXTRAVERSION:0:1} == "." ]; then
> EXTRAVER=${EXTRAVERSION:1}
> else
> EXTRAVER=$EXTRAVERSION
> fi
> EXTRAVER=${EXTRAVER%%[[:punct:]]*}
> #echo "$PNAME: changing EXTRAVERSION from $EXTRAVERSION to $EXTRAVER"
> fi
> 
> part, which the sed expression (moderately successfully) tried to
> take care of.

You don't need a sed expression for that.  In fact the inner
if clause can be replaced by the following line:

EXTRAVER=${EXTRAVERSION#.}

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT pull] x86 updates for 2.6.24

2007-11-17 Thread Linus Torvalds



On Sat, 17 Nov 2007, Linus Torvalds wrote:
> 
> Heh. I applied it just before pulling, so it's there twice. 

.. btw, Thomas, you forgot to add your sign-off to your version.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT pull] x86 updates for 2.6.24

2007-11-17 Thread Linus Torvalds

On Sat, 17 Nov 2007, Thomas Gleixner wrote:
> 
> Just added Sam's latest fix for the x86 build mechanism on top of the 
> patches below.

Heh. I applied it just before pulling, so it's there twice. Git obviously 
then merged it without problems, so I didn't even notice until you 
mentioned it (because the diffstat matched your original diffstat due to 
the new commit you added not actually causing any differences ;)

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] wait_task_stopped: tidy up the noreap case

2007-11-17 Thread Oleg Nesterov

On 11/16, Roland McGrath wrote:
>
> This is good, but not quite enough.  The original intent behind having the
> test was never to return mismatched stale/fresh data.  (Not that it ever
> really worked as intended.)  That is, it's fine if the task has woken up
> and done other things while WNOWAIT reports it as stopped--that's stale
> data, but it just means the waitid call happened "before" the resumption.
> However, it should not report anything that could not possibly have been
> true before the resumption.  i.e. a changed exit_code, which now means an
> normal termination status or a death signal, not the stop signal.  This
> also applies to the uid, in case the thread called setuid upon resuming
> (and even to ptracedness, not that that one really matters).  (It doesn't
> matter for rusage, since that's not really an exact change of state with
> reliable ordering anyway.)
>
> So the setting of uid and why should also move before read_unlock.

Yes I agree, and I also realized this. In fact, I already tried to do this
a long ago: http://marc.info/?l=linux-kernel=112809846204068, please note
that !noreap branch should be changed as well.

This time I'am trying to cleanup (remove) the games with ->exit_state first.
I am mostly concerned about 3/3 patch, what do you think about it?

And. Please note that 3/3 removes the "It must also be done with the write
lock held to prevent a race with the EXIT_ZOMBIE case" comment. Afaics, we
don't need write_lock(tasklist) any longer, we can simplify things further
and remove the EGAIN case completely.

However, wait_task_stopped does:

/* move to end of parent's list to avoid starvation */
remove_parent(p);
add_parent(p);

That is why we need write_lock(). Is this really so important? Yes, the next
do_wait() can find another "interesting" task a bit faster, but only a little
bit. wait_task_continued() could be optimized in a same manner...

Also. I think the locking is not complete. {read,write}_lock(tasklist) can't
really pin the task in TRACED/STOPPED state. We need ->siglock to ensure that
the child can't escape from get_signal_to_deliver() at least, so it can't do
exit/setuid/etc. I was going to try to do this later, because this needs nasty
changes...

Oh well. OK, we can ignore patches 2-3 for now. I'd like to know your opinion
before going further, perhaps I missed something else.

> While you're at it, you could fix the status argument to wait_noreap_copyout.
> It should be just exit_code, not the WIFSTOPPED bit format it does now.

OK, unless Scott is going to do this.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] eradicate bashisms in scripts/patch-kernel

2007-11-17 Thread Andreas Mohr

Hi,

On Wed, Nov 14, 2007 at 02:46:27PM -0800, Randy Dunlap wrote:
> On Mon, 5 Nov 2007 20:58:27 +0100 Andreas Mohr wrote:
> > Feel free to go ahead, otherwise I'll try another patch sometime soon.
> > All I care about is that the result works on (at least)
> > one shell implementation _more_ than the current status ;)
> 
> Hi Andreas,
> 
> Can you comment on (or test) whether this patch is sufficient
> for your needs?  And if so, is the Signed-off-by: A.M. OK?

Sorry, no, using dash (0.5.3-5) there's still a remaining

$ linux-2.6.22/scripts/patch-kernel linux-2.6.22 /usr/src/patch-2.6
Current kernel version is 2.6.22 ( Holy Dancing Manatees, Batman!)
linux-2.6.22/scripts/patch-kernel: 207: Syntax error: Bad substitution

error in the

# strip EXTRAVERSION to just a number (drop leading '.' and trailing additions)
EXTRAVER=
if [ x$EXTRAVERSION != "x" ]
then
>>> [l.207]   if [ ${EXTRAVERSION:0:1} == "." ]; then
EXTRAVER=${EXTRAVERSION:1}
else
EXTRAVER=$EXTRAVERSION
fi
EXTRAVER=${EXTRAVER%%[[:punct:]]*}
#echo "$PNAME: changing EXTRAVERSION from $EXTRAVERSION to $EXTRAVER"
fi

part, which the sed expression (moderately successfully) tried to
take care of.

The good part of the story is that the current corrected almost fully
working version still works with bash (3.1dfsg-8), just like the
original patch-kernel version.

Signed-off-by would be fine by me.

Thanks,

Andreas Mohr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

2007-11-17 Thread Daniel Walker

On Sat, 2007-11-17 at 12:44 +0100, Remy Bohmer wrote:
> Hello Steven,
> 
> > The taker of a mutex must also be the one that releases it.  I don't see
> > how you could use a mutex for this. It really requires some kind of
> > completion, or a compat_semaphore.
> 
> I tried several ways of working around the bug, even tried
> implementing it with kernel threads and protecting global data with
> mutexes. Therefor I know that I have the same problem with mutexes. I
> just created a simple example that showed the problem quickly, this
> does not mean that this is the only case that does not work.

I tried your example and I was able to reproduce the OOPS that you
found.. Although there is one problem, you don't have the same number of
up()'s to down() calls so you end up leaving the dummy_read function
with the lock still held ..

Reviewing the OOPS and the warnings it looks like your progressively
corrupting the mutex waiter list since remove_waiter() actually leaves
the stack based waiter object on the waiter list.. (That's what it looks
like anyway)..

So I converted your code to use a compat_semaphore, and no oops
happens.. Which makes sense because compat_semaphores are designed to
work the way your using them.

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 >

1 - 100 of 316 matches

Mail list logo