Re: RSDL v0.31

2007-03-16 Thread Ingo Molnar

* Nicholas Miell <[EMAIL PROTECTED]> wrote:

> > I'm saying that the current scheduler adjusts for interactive loads, 
> > this new one doesn't.  I'm seeing interactivity regressions, and 
> > they are not fixed with nice unless nice is used to maximum effect.  
> > I'm saying yes, I can lower my expectations, but no I don't want to.
> 
> Uh, no. Essentially, the current scheduler works around X's 
> brokenness, in an often unpredictable manner.

No. The two schedulers simply use different heuristics. RSDL uses _less_ 
heuristics, and thus gets some workloads right that the heuristics in 
the current scheduler got wrong. But it also gets some other workloads 
wrong.

so basically, the current scheduler has a built-in "auto-nice" feature, 
while RSDL relies more on manual assignment of nice values.

if you want no heuristics at all you can do it in the current scheduler: 
use SCHED_BATCH on your shell and start up X with that. I'd not mind 
tweaking SCHED_BATCH with an RSDL-alike timeslice quota system.

so it is not at all clear to me that RSDL is indeed an improvement, if 
it does not have comparable auto-nice properties.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.31

2007-03-16 Thread Ingo Molnar

* Nicholas Miell <[EMAIL PROTECTED]> wrote:

> The X people have plans for how to go about fixing this, [...]

then we'll first have wait for those X changes to at least be done in a 
minimal manner so that they can be tested for real with RSDL. (is it 
_really_ due to that? Or will X regress forever once we switch to RSDL?) 
We cannot regress the scheduling of a workload as important as "X mixed 
with CPU-intense tasks". And "in theory this should be fixed if X is 
fixed" does not cut it. X is pretty much _the_ most important thing to 
optimize the interactive behavior of a Linux scheduler for. Also, 
paradoxically, it is precisely the improvement of _X_ workloads that 
RSDL argues with.

this regression has to be fixed before RSDL can be merged, simply 
because it is a pretty negative effect that goes beyond any of the 
visible positive improvements that RSDL brings over the current 
scheduler. If it is better to fix X, then X has to be fixed _first_, at 
least in form of a prototype patch that can be _tested_, and then the 
result has to be validated against RSDL.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.31

2007-03-16 Thread Nicholas Miell
On Sat, 2007-03-17 at 00:25 -0700, William Lee Irwin III wrote:
> On Sat, Mar 17, 2007 at 08:11:57AM +0100, Mike Galbraith wrote:
> > On a side note, I wonder how long it's going to take to fix all the
> > X/client combinations out there.
> 
> AIUI X's clients largely access it via libraries X ships, so the X
> update will sweep the vast majority of them in one shot. You'll have
> to either run the clients from remote hosts with downrev libraries or
> have downrev libraries around (e.g. in chroots) for clients to link to
> for the clients not to cooperate.
> 

The changes will probably be entirely server-side anyway, so stray
ancient libraries won't be a problem.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: FF layer restrictions [Was: [PATCH 1/1] Input: add sensable phantom driver]

2007-03-16 Thread johann deneux

On 3/16/07, Pavel Machek <[EMAIL PROTECTED]> wrote:

Hi!

> >Why did you remove all Cced people? Anyway I filtered
> >some of them out
> >
> >johann deneux napsal(a):
> >> You are right, the direction in ff_effect is meant to
> >be an angle.
> >> A dirty solution would be to use the 16 bits as two
> >8-bits angles. Or
> >
> >That would be a problem as I need 3x 16bits.
> >
> >> maybe we should change the API. I don't think there
> >are many
> >> applications using force feedback yet, so maybe that
> >should be ok?
> >>
> >> If we change the API, we should remove the assumption
> >that a device has
> >> at most two axes to render effects. We could for
> >instance have a
> >> magnitude argument for each axis which is capable of
> >rendering effects.
> >> That might be necessary even for more common gaming
> >devices like racing
> >> wheels: One can think pedals could also be capable of
> >force feedback
> >> some day, not just the steering wheel.
> >
> >I can do that, but in that case, I need to know how
> >people (especially those
> >input one) want me to do...
> >
>
> Since we have no idea how many programs (if any) are
> using force
> feedback interface I would be wary of changing existing
> effcets and
> rather add new set of 3D effects.
>
> Do we have any idea if there any users of FF out there?

Number of linux games is quite low, so...

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html



Games are not the only application type using FF. I got a few
enquiries from universities working on robotics project.
I think keeping backward compatibility is not a problem here. The
problem is to make an extension that does not duplicate the
capabilities of the existing API. We don't want to have two ways of
specifying the same effects.

--
Johann
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.31

2007-03-16 Thread William Lee Irwin III
On Sat, Mar 17, 2007 at 08:11:57AM +0100, Mike Galbraith wrote:
> On a side note, I wonder how long it's going to take to fix all the
> X/client combinations out there.

AIUI X's clients largely access it via libraries X ships, so the X
update will sweep the vast majority of them in one shot. You'll have
to either run the clients from remote hosts with downrev libraries or
have downrev libraries around (e.g. in chroots) for clients to link to
for the clients not to cooperate.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: trust the PM-Timer calibration of the local APIC timer

2007-03-16 Thread Ingo Molnar

* Thomas Gleixner <[EMAIL PROTECTED]> wrote:

> When PM-Timer is available for local APIC timer calibration we can 
> skip the verification of the calibrated time value. The resulting 
> error is quite small on a bunch of evaluated platforms and is less 
> harming than the observed false positives.
> 
> We need to keep the verification on systems, which have no PM-Timer to 
> avoid bogus local APIC timer calibrations in the range of factor 2-10, 
> which can be observed when swicthing off the PM-timer support in the 
> kernel configuration.
> 
> The wrong calibration values are probably caused by SMM code trying to 
> emulate a PS/2 keyboard from a (maybe connected or not) USB keyboard. 
> This prohibits the accurate delivery of PIT interrupts, which are used 
> to calibrate the local APIC timer. Unfortunately we have no way to 
> disable this BIOS misfeature in the early boot process.
> 
> Add also the dropped cpu_relax() back to the wait loops.
> 
> Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>

Acked-by: Ingo Molnar <[EMAIL PROTECTED]>

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.31

2007-03-16 Thread Mike Galbraith
On Fri, 2007-03-16 at 23:26 -0700, Nicholas Miell wrote:

> RSDL appears to be completely deterministic, which is a very strong
> virtue.

Yes.  That's why RSDL aroused my curiosity big time.

> The X people have plans for how to go about fixing this, but until then,
> there's no reason to hold up kernel development.

I'm not in a position to hold up development.

On a side note, I wonder how long it's going to take to fix all the
X/client combinations out there.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] dont use obsolete index() function in lxdialog

2007-03-16 Thread Sam Ravnborg
On Sat, Mar 17, 2007 at 02:37:07AM -0400, Mike Frysinger wrote:
> The index() function is obsolete, use strchr() instead.

Thanks, applied.

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Linux 2.4.x MTD CFI P30/P33 support

2007-03-16 Thread Willy Tarreau
Hi Alexey,

On Sat, Mar 17, 2007 at 02:45:13AM +0300, Korolev, Alexey wrote:
> Hello Willy,
> 
> The following patch adds support of P30 and P33 NOR FLASH support in
> Linux 2.4
> This flash is substitution of J3 flash which is widely used it on Linux
> 2.4 kernels. Currently many customers wishing to substitute J3 for P3x
> face issues on Linux 2.4. This patch resolves issues with P3x on all
> generic Linux kernels versions since 2.4.21.
> 
> The patch just allows using minor version "4" in CFI driver. Since
> differences between CFI minor version "3" and minor version "4" are
> small the CFI driver is not affected by this. Patch has been verified on
> Mainstone (PXA27x based) platform.

OK, it seems trivial enough to me. I have no problem merging this. Your
mailer has wrapped lines, but I'll fix this by hand.

David, are you OK too ?

Thanks,
Willy

> Signed-off-by: Alexey Korolev <[EMAIL PROTECTED]>
> ==
> --- a/drivers/mtd/chips/cfi_cmdset_0001.c 2003-06-13
> 18:51:34.0 +0400
> +++ b/drivers/mtd/chips/cfi_cmdset_0001.c 2007-02-16
> 21:39:50.0 +0300
> @@ -152,7 +152,7 @@
>   }
>   
>   if (extp->MajorVersion != '1' || 
> - (extp->MinorVersion < '0' || extp->MinorVersion >
> '3')) {
> + (extp->MinorVersion < '0' || extp->MinorVersion >
> '4')) {
>   printk(KERN_WARNING "  Unknown IntelExt Extended
> Query "
>  "version %c.%c.\n",  extp->MajorVersion,
>  extp->MinorVersion);
> ===
> 
> Thanks,
> Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21-rc4

2007-03-16 Thread Sam Ravnborg
On Fri, Mar 16, 2007 at 03:39:57PM -0700, Randy Dunlap wrote:
> On Fri, 16 Mar 2007 14:11:21 -0700 Randy Dunlap wrote:
> 
> > On Fri, 16 Mar 2007 09:33:54 -0700 (PDT) Linus Torvalds wrote:
> > 
> > > 
> > > I pushed out the -git trees yesterday, but then got distracted, so the 
> > > patches and tar-balls and the announcement got delayed until this 
> > > morning. 
> > > Oops. I'm a scatter-brain.
> > 
> > allmodconfig on i386:
> > 
> > WARNING: "default_idle" [arch/i386/kernel/apm.ko] undefined!
> > WARNING: "machine_real_restart" [arch/i386/kernel/apm.ko] undefined!
> > make[1]: *** [__modpost] Error 1
> > make: *** [modules] Error 2
> 
> Please ignore.
> 
> I think that this was the result of doing 'make allyesconfig && make all'
> followed by 'make allmodconfig && make all' without doing a 'make clean'
> between them.
But then we have a dependency error somewhere we need to track down.
I will try to test here.

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] dont use obsolete index() function in lxdialog

2007-03-16 Thread Mike Frysinger
The index() function is obsolete, use strchr() instead.

Signed-off-by: Mike Frysinger <[EMAIL PROTECTED]>
---
--- a/scripts/kconfig/lxdialog/util.c
+++ b/scripts/kconfig/lxdialog/util.c
@@ -336,7 +336,7 @@
newl = 1;
word = tempstr;
while (word && *word) {
-   sp = index(word, ' ');
+   sp = strchr(word, ' ');
if (sp)
*sp++ = 0;
 
@@ -348,7 +348,7 @@
if (wlen > room ||
(newl && wlen < 4 && sp
 && wlen + 1 + strlen(sp) > room
-&& (!(sp2 = index(sp, ' '))
+&& (!(sp2 = strchr(sp, ' '))
 || wlen + 1 + (sp2 - sp) > room))) {
cur_y++;
cur_x = x;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6.21-rc4] kernel/exit: Fix a comment and code contradiction

2007-03-16 Thread Ahmed S. Darwish
Hi list,

Comment in release_task() claims that group leader's parent process 
is signalled only if it desires so, which is not true.

Signed-off-by: Ahmed S. Darwish <[EMAIL PROTECTED]>
---

To save your time, here's the contradictory code which don't appear in 
the patch (appears after its last line):

  leader = p->group_leader;
  if (leader != p && thread_group_empty(leader) && leader->exit_state == 
EXIT_ZOMBIE) {
BUG_ON(leader->exit_signal == -1);
do_notify_parent(leader, leader->exit_signal);


diff --git a/kernel/exit.c b/kernel/exit.c
index f132349..4a0a35f 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -152,7 +152,7 @@ repeat:
/*
 * If we are the last non-leader member of the thread
 * group, and the leader is zombie, then notify the
-* group leader's parent process. (if it wants notification.)
+* group leader's parent process.
 */
zap_leader = 0;
leader = p->group_leader;

-- 
Ahmed S. Darwish
http://darwish.07.googlepages.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.31

2007-03-16 Thread Nicholas Miell
On Sat, 2007-03-17 at 06:56 +0100, Mike Galbraith wrote:
> On Fri, 2007-03-16 at 21:24 -0700, Nicholas Miell wrote:
> 
> > Sorry, I haven't really been following this thread and now I'm confused.
> > 
> > You're saying that it's somehow the scheduler's fault that X isn't
> > running with a high enough priority?
> 
> I'm saying that the current scheduler adjusts for interactive loads,
> this new one doesn't.  I'm seeing interactivity regressions, and they
> are not fixed with nice unless nice is used to maximum effect.  I'm
> saying yes, I can lower my expectations, but no I don't want to.
> 
> A four line summary is as short as I can make it.
> 
>   -Mike

Uh, no. Essentially, the current scheduler works around X's brokenness,
in an often unpredictable manner.

RSDL appears to be completely deterministic, which is a very strong
virtue.

The X people have plans for how to go about fixing this, but until then,
there's no reason to hold up kernel development.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.31

2007-03-16 Thread Mike Galbraith
P.S.  "utter failure" was too harsh.  What sticks in my craw is that the
world has to adjust to fit this new scheduler.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.31

2007-03-16 Thread Mike Galbraith
On Fri, 2007-03-16 at 21:24 -0700, Nicholas Miell wrote:

> Sorry, I haven't really been following this thread and now I'm confused.
> 
> You're saying that it's somehow the scheduler's fault that X isn't
> running with a high enough priority?

I'm saying that the current scheduler adjusts for interactive loads,
this new one doesn't.  I'm seeing interactivity regressions, and they
are not fixed with nice unless nice is used to maximum effect.  I'm
saying yes, I can lower my expectations, but no I don't want to.

A four line summary is as short as I can make it.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kprobes: fix sparse NULL warning

2007-03-16 Thread Ananth N Mavinakayanahalli
On Fri, Mar 16, 2007 at 06:34:36PM -0700, Randy Dunlap wrote:
> From: Randy Dunlap <[EMAIL PROTECTED]>
> 
> Fix sparse NULL warnings:
> kernel/kprobes.c:915:49: warning: Using plain integer as NULL pointer

Thanks for catching this Randy.
 
> Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
Acked-by: Ananth N Mavinakayanahalli <[EMAIL PROTECTED]>

> ---
>  kernel/kprobes.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> --- linux-2621-rc4.orig/kernel/kprobes.c
> +++ linux-2621-rc4/kernel/kprobes.c
> @@ -35,6 +35,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -912,7 +913,7 @@ static int __kprobes debugfs_kprobe_init
>   if (!dir)
>   return -ENOMEM;
> 
> - file = debugfs_create_file("list", 0444, dir , 0 ,
> + file = debugfs_create_file("list", 0444, dir, NULL,
>   &debugfs_kprobes_operations);
>   if (!file) {
>   debugfs_remove(dir);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forced umount?

2007-03-16 Thread Mike Snitzer

On 3/16/07, Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:

Mike Snitzer wrote:
> Is this forced umount work even considered worthwhile by the greater
> Linux community?  Is anyone actively working on this?

Have a look at all the discussion about revoke/frevoke on lkml over the
last week or two.


Thanks for the heads up; its good to see that Pekka Enberg's work has
continued.  I actually stumbled onto that line of work earlier while
searching for more info on Tigran Aivazian's forced unmount (badfs)
patches:
http://lwn.net/Articles/192632/

Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] sys_fallocate() system call

2007-03-16 Thread Stephen Rothwell
On Fri, 16 Mar 2007 20:01:01 +0530 "Amit K. Arora" <[EMAIL PROTECTED]> wrote:
>

> +asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len);
>
> --- linux-2.6.20.1.orig/include/asm-powerpc/systbl.h
> +++ linux-2.6.20.1/include/asm-powerpc/systbl.h
> @@ -305,3 +305,4 @@ SYSCALL_SPU(faccessat)
>  COMPAT_SYS_SPU(get_robust_list)
>  COMPAT_SYS_SPU(set_robust_list)
>  COMPAT_SYS(move_pages)
> +SYSCALL(fallocate)

It is going to need to be a COMPAT_SYS call in powerpc because 32 bit
powerpc will pass the two loff_t's in pairs of registers while
64bit passes them in one register each.

--
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpH3NeJvjAm3.pgp
Description: PGP signature


Re: forced umount?

2007-03-16 Thread Gene Heskett
On Saturday 17 March 2007, Mike Snitzer wrote:
>I'm interested in understanding the state of Linux with regard to
>_really_ forcing a filesystem to unmount.
>
>There is a (stale) project at OSDL that has various implementations:
>http://developer.osdl.org/dev/fumount/
>
>Its fairly clear that these efforts (e.g. badfs patches) haven't been
>given serious consideration for upstream inclusion.  Do others see
>value in the ability to _reliably_ force a umount by having Linux
>discard all IOs, open files, dirty inode buffers, etc of a "bad"
>blockdevice?  The goal is to not impact the availability or integrity
>of Linux while doing so.
>
>Is this forced umount work even considered worthwhile by the greater
>Linux community?  Is anyone actively working on this?

Having been 'caught out' on this subject more than a few times, usually by 
shutting down a remotely located box that was mounted via smb or cifs, 
and found the only way to get sanity back to the rest of the system was a 
hard reset of every other box that was also sharing that mount, I would 
think this is a worthwhile project.

Take that as a yes vote, from somebody who isn't franchised to vote on it 
in the first place, I'm just a user, usually playing the part of the 
canary in the coal mine.

>Mike

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
QOTD:
"There may be no excuse for laziness, but I'm sure looking."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.31

2007-03-16 Thread Gene Heskett
Greetings Con & company;

I built and rebooted to 2.6.20.3-rdsl-0.31 earlier this evening, but 
purposely waited till amanda was well underway to make a report.

The report is that I really really have to work hard to tell that amanda 
is running even though the cpu according to gkrellm is running between 97 
and 99%.

For my loading then, this is as much an improvement over the -ck1 patch as 
it was over the un-patched but same version of the kernel.  FWIW, I'd 
also built a 2.6.20.3-rdsl-0.30 and ran it for a day but it was nearly as 
spastic as no patch.

Did I say I like this yet? :)

Now I'm waiting for 2.6.21-rc4 to make the mirrors & see if tar is still 
broken.  Based on the clues I've been able to find, I bz'd the tar since 
that's a fedora supplied rpm install.  Humm, I just now recalled that I 
have a tarball built tar-1.15.1 on another drive, I was using it when I 
was running FC2, so that might be something else to bisect against, and I 
will, bet on it.

Many thanks Con, this is very nice.  I've only seen one split second when 
the screen was about 2 chars behind my typing.  This is great. :-)

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
When a cow laughs, does milk come out of its nose?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.31

2007-03-16 Thread Con Kolivas
On Saturday 17 March 2007 15:40, Al Boldi wrote:
> Con Kolivas wrote:
> > On Saturday 17 March 2007 08:55, Al Boldi wrote:
> > > With X nice'd at -10, and 11 hogs loading the cpu, interactivity looks
> > > good until the default timeslice/quota is exhausted and slows down.
> > > Maybe adjusting this according to nice could help.
> >
> > Not sure what you mean by that. It's still a fair distribution system,
> > it's just that nice -10 has quite a bit more cpu allocated than nice 0.
> > Eventually if you throw enough load at it it will slow down too.
>
> I mean #DEF_TIMESLICE seems like an initial quota, which gets reset with
> each major rotation.  Increasing this according to nice may give it more
> room for an interactivity boost, before being expired.  Alternatively, you
> may just reset the rotation, if the task didn't use it's quota within one
> full major rotation, effectively skipping the expiry toll.

DEF_TIMESLICE is a value used for smp balancing and has no effect on quota so 
I doubt you mean that value. The quota you're describing of not resetting is 
something like the sleep average idea of current systems where you accumulate 
bonus points by sleeping when you would be running and redeem them later. 
This is exactly the system I'm avoiding using in rsdl as you'd have to decide 
just how much sleep time it could accumulate, and over how long it would run 
out, and so on. ie that's the interactivity estimator. This is the system 
that destroys any guarantee of cpu percentage, and ends up leading to periods 
of relative starvation, is open to tuning that can either be too short or too 
long depending on the values you chose and so on.

> > > It may also be advisable to fix latencies according to nice, and adjust
> > > timeslices instead.  This may help scaleability a lot, as there are
> > > some timing sensitive apps that may crash under high load.
> >
> > You will find that is the case already with this version. Even under
> > heavy load if you were to be running one server niced (say httpd nice 19
> > in the presence of mysql nice 0) the latencies would be drastically
> > reduced compared to mainline behaviour. I am aware this becomes an issue
> > for some heavily loaded servers because some servers run multithreaded
> > while others do not, forcing the admins to nice their multithreaded ones.
>
> The thing is, latencies are currently dependent on the number of tasks in
> the run-queue; i.e. more rq-tasks means higher latencies, yet fixed
> timeslices according to nice.  Just switching this the other way around, by
> fixing latencies according to nice, and adjusting the timeslices depending
> on rq-load, may yield a much more scalable system.

That is not really feasible to implement. How can you guarantee latencies when 
the system is overloaded? If you have 1000 tasks all trying to get scheduled 
in say 10ms you end up running for only 10 microseconds at a time. That will 
achieve the exact opposite whereby as the load increases the runtime gets 
shorter and shorter till cpu cache trashing and no real work occurs.

> Thanks!
>
> --
> Al

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.31

2007-03-16 Thread Al Boldi
Con Kolivas wrote:
> On Saturday 17 March 2007 08:55, Al Boldi wrote:
> > With X nice'd at -10, and 11 hogs loading the cpu, interactivity looks
> > good until the default timeslice/quota is exhausted and slows down. 
> > Maybe adjusting this according to nice could help.
>
> Not sure what you mean by that. It's still a fair distribution system,
> it's just that nice -10 has quite a bit more cpu allocated than nice 0.
> Eventually if you throw enough load at it it will slow down too.

I mean #DEF_TIMESLICE seems like an initial quota, which gets reset with each 
major rotation.  Increasing this according to nice may give it more room for 
an interactivity boost, before being expired.  Alternatively, you may just 
reset the rotation, if the task didn't use it's quota within one full major 
rotation, effectively skipping the expiry toll.

> > It may also be advisable to fix latencies according to nice, and adjust
> > timeslices instead.  This may help scaleability a lot, as there are some
> > timing sensitive apps that may crash under high load.
>
> You will find that is the case already with this version. Even under heavy
> load if you were to be running one server niced (say httpd nice 19 in the
> presence of mysql nice 0) the latencies would be drastically reduced
> compared to mainline behaviour. I am aware this becomes an issue for some
> heavily loaded servers because some servers run multithreaded while others
> do not, forcing the admins to nice their multithreaded ones.

The thing is, latencies are currently dependent on the number of tasks in the 
run-queue; i.e. more rq-tasks means higher latencies, yet fixed timeslices 
according to nice.  Just switching this the other way around, by fixing 
latencies according to nice, and adjusting the timeslices depending on 
rq-load, may yield a much more scalable system.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forced umount?

2007-03-16 Thread Jeremy Fitzhardinge
Mike Snitzer wrote:
> Is this forced umount work even considered worthwhile by the greater
> Linux community?  Is anyone actively working on this?

Have a look at all the discussion about revoke/frevoke on lkml over the
last week or two.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.31

2007-03-16 Thread Nicholas Miell
On Fri, 2007-03-16 at 23:30 +0100, Mike Galbraith wrote:
> On Sat, 2007-03-17 at 08:13 +1100, Con Kolivas wrote:
> > On Saturday 17 March 2007 02:34, Mike Galbraith wrote:
> > > On Sat, 2007-03-17 at 00:40 +1100, Con Kolivas wrote:
> > > > Here are full patches for rsdl 0.31 for various base kernels. A full
> > > > announce with a fresh -mm series will follow...
> > > >
> > > > http://ck.kolivas.org/patches/staircase-deadline/2.6.20.3-rsdl-0.31.patch
> > > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0.
> > > >31.patch
> > > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.31
> > > >.patch
> > >
> > > It still has trouble with the x/gforce vs two niced encoders scenario.
> > > The previously reported choppiness is still present.
> > >
> > > I suspect that x/gforce landing in the expired array is the trouble, and
> > > that this will never be smooth without some kind of exemption.  I added
> > > some targeted unfairness to .30, and it didn't help much at all.
> > >
> > > Priorities going all the way to 1 were a surprise.
> > 
> > It wasn't going to change that case without renicing X.
> 
> Con.  You are trying to wedge a fair scheduler into an environment where
> totally fair simply can not possibly function.
> 
> If this is your final answer to the problem space, I am done testing,
> and as far as _I_ am concerned, your scheduler is an utter failure.
> 

Sorry, I haven't really been following this thread and now I'm confused.

You're saying that it's somehow the scheduler's fault that X isn't
running with a high enough priority?

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


forced umount?

2007-03-16 Thread Mike Snitzer

I'm interested in understanding the state of Linux with regard to
_really_ forcing a filesystem to unmount.

There is a (stale) project at OSDL that has various implementations:
http://developer.osdl.org/dev/fumount/

Its fairly clear that these efforts (e.g. badfs patches) haven't been
given serious consideration for upstream inclusion.  Do others see
value in the ability to _reliably_ force a umount by having Linux
discard all IOs, open files, dirty inode buffers, etc of a "bad"
blockdevice?  The goal is to not impact the availability or integrity
of Linux while doing so.

Is this forced umount work even considered worthwhile by the greater
Linux community?  Is anyone actively working on this?

Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] MMC: Fix handling of low-voltage cards (take 2)

2007-03-16 Thread Philip Langdale
Fix handling of low voltage MMC cards.

The latest MMC and SD specs both agree that support for
low-voltage operations is indicated by bit 7 in the OCR.
The MMC spec states that the low voltage range is
1.65-1.95V while the SD spec leaves the actual voltage
range undefined - meaning that there is still no such
thing as a low voltage SD card.

However, an old Sandisk spec implied that bits 7.0
represented voltages below 2.0V in 1V or 0.5V increments,
and the code was accordingly written with that expectation.

This confusion meant that host drivers attempting to support
the typical low voltage (1.8V) would set the wrong bits in
the host OCR mask (usually bits 5 and/or 6) resulting in the
the low voltage mode never being used.

This change corrects the low voltage range and adds sanity
checks on the reserved bits (0-6) and for SD cards that
claim to support low-voltage operations.

Signed-off-by: Philip Langdale <[EMAIL PROTECTED]>

diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index 21c0517..310b242 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -319,6 +319,17 @@ int mmc_attach_mmc(struct mmc_host *host

mmc_attach_bus(host, &mmc_ops);

+   /*
+* Sanity check the voltages that the card claims to
+* support.
+*/
+   if (ocr & 0x7F) {
+   printk(KERN_WARNING "%s: card claims to support voltages "
+  "below the defined range. These will be ignored.\n",
+  mmc_hostname(host));
+   ocr &= ~0x7F;
+   }
+
host->ocr = mmc_select_voltage(host, ocr);

/*
diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
index 6289ae6..e27845e 100644
--- a/drivers/mmc/core/sd.c
+++ b/drivers/mmc/core/sd.c
@@ -297,6 +297,24 @@ int mmc_attach_sd(struct mmc_host *host,

mmc_attach_bus(host, &mmc_sd_ops);

+   /*
+* Sanity check the voltages that the card claims to
+* support.
+*/
+   if (ocr & 0x7F) {
+   printk(KERN_WARNING "%s: card claims to support voltages "
+  "below the defined range. These will be ignored.\n",
+  mmc_hostname(host));
+   ocr &= ~0x7F;
+   }
+
+   if (ocr & MMC_VDD_165_195) {
+   printk(KERN_WARNING "%s: SD card claims to support the "
+  "incompletely defined 'low voltage range'. This "
+  "will be ignored.\n", mmc_hostname(host));
+   ocr &= ~MMC_VDD_165_195;
+   }
+
host->ocr = mmc_select_voltage(host, ocr);

/*
diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 2f34ae3..a80c043 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -669,8 +669,7 @@ static void sdhci_set_power(struct sdhci
pwr = SDHCI_POWER_ON;

switch (1 << power) {
-   case MMC_VDD_17_18:
-   case MMC_VDD_18_19:
+   case MMC_VDD_165_195:
pwr |= SDHCI_POWER_180;
break;
case MMC_VDD_29_30:
@@ -1290,7 +1289,7 @@ static int __devinit sdhci_probe_slot(st
if (caps & SDHCI_CAN_VDD_300)
mmc->ocr_avail |= MMC_VDD_29_30|MMC_VDD_30_31;
if (caps & SDHCI_CAN_VDD_180)
-   mmc->ocr_avail |= MMC_VDD_17_18|MMC_VDD_18_19;
+   mmc->ocr_avail |= MMC_VDD_165_195;

if (mmc->ocr_avail == 0) {
printk(KERN_ERR "%s: Hardware doesn't report any "
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 5a66d8a..b1350df 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -65,14 +65,7 @@ struct mmc_host {
unsigned intf_max;
u32 ocr_avail;

-#define MMC_VDD_145_1500x0001  /* VDD voltage 1.45 - 
1.50 */
-#define MMC_VDD_150_1550x0002  /* VDD voltage 1.50 - 
1.55 */
-#define MMC_VDD_155_1600x0004  /* VDD voltage 1.55 - 
1.60 */
-#define MMC_VDD_160_1650x0008  /* VDD voltage 1.60 - 
1.65 */
-#define MMC_VDD_165_1700x0010  /* VDD voltage 1.65 - 
1.70 */
-#define MMC_VDD_17_18  0x0020  /* VDD voltage 1.7 - 1.8 */
-#define MMC_VDD_18_19  0x0040  /* VDD voltage 1.8 - 1.9 */
-#define MMC_VDD_19_20  0x0080  /* VDD voltage 1.9 - 2.0 */
+#define MMC_VDD_165_1950x0080  /* VDD voltage 1.65 - 
1.95 */
 #define MMC_VDD_20_21  0x0100  /* VDD voltage 2.0 ~ 2.1 */
 #define MMC_VDD_21_22  0x0200  /* VDD voltage 2.1 ~ 2.2 */
 #define MMC_VDD_22_23  0x0400  /* VDD voltage 2.2 ~ 2.3 */

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.31

2007-03-16 Thread Con Kolivas
On Saturday 17 March 2007 08:55, Al Boldi wrote:
> Con Kolivas wrote:
> > Here are full patches for rsdl 0.31 for various base kernels. A full
> > announce with a fresh -mm series will follow...
> >
> > http://ck.kolivas.org/patches/staircase-deadline/2.6.20.3-rsdl-0.31.patch
>
> Thanks!  It looks much better now.

Thank you. You should find most of your latency concerns you brought up have 
been addressed by this change.

> With X nice'd at -10, and 11 hogs loading the cpu, interactivity looks good
> until the default timeslice/quota is exhausted and slows down.  Maybe
> adjusting this according to nice could help.

Not sure what you mean by that. It's still a fair distribution system, it's 
just that nice -10 has quite a bit more cpu allocated than nice 0. Eventually 
if you throw enough load at it it will slow down too.

> It may also be advisable to fix latencies according to nice, and adjust
> timeslices instead.  This may help scaleability a lot, as there are some
> timing sensitive apps that may crash under high load.

You will find that is the case already with this version. Even under heavy 
load if you were to be running one server niced (say httpd nice 19 in the 
presence of mysql nice 0) the latencies would be drastically reduced compared 
to mainline behaviour. I am aware this becomes an issue for some heavily 
loaded servers because some servers run multithreaded while others do not, 
forcing the admins to nice their multithreaded ones. 

> Thanks!
>
> --
> Al

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA problems in 2.6.20.3

2007-03-16 Thread Charles Shannon Hendrix
On Fri, 16 Mar 2007 17:44:25 -0600
Robert Hancock <[EMAIL PROTECTED]> wrote:

> Charles Shannon Hendrix wrote:
> > I normally run a modified 2.6.19 kernel and it works great.
> > 
> > I recently tried 2.6.20 and had severe SATA problems with it.
> > 
> > Yesterday I tried 2.6.20.3, and the problems are still there.
> 
> Can you try 2.6.21-rc and see if the problem is fixed in those kernels?

OK.

sata_nv.adma=0 let's me run 2.6.20.3 for now.

I'll test 2.6.21-rc tomorrow some time.




-- 
shannon   | Work for something because it is good, not just because 
  | it stands a chance to succeed. 
  |-- Vaclav Havel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


is SpellCaster ISDN still used?

2007-03-16 Thread Randy Dunlap

There is an API argument mismatch in it:

drivers/isdn/sc/init.c:281: warning: assignment from incompatible pointer type

interface->writebuf_skb = sndpkt;

where:

int sndpkt(int devId, int channel, struct sk_buff *data)
{
...
}

should look like this:

  /*
   * Send data using sk_buff's
   * Parameters:
   * intdriverId
   * intlocal channel-number (0...)
   * intFlag: Need ACK for this packet.
   * struct sk_buff *skbData to send
   */
  int (*writebuf_skb) (int, int, int, struct sk_buff *);


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far

2007-03-16 Thread Len Brown
On Friday 16 March 2007 19:44, Thomas Gleixner wrote:
> Maxim,
> 
> On Fri, 2007-03-16 at 12:30 +0200, Maxim Levitsky wrote:
> > 3) Sometimes I get this (once in three boots or so)
> > 
> > [   36.217405] ENABLING IO-APIC IRQs
> > [   36.217587] ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
> > [   36.433917] APIC timer disabled due to verification failure.
> > 
> > And NO_HZ is disabled due to that (I get 1000/s timer's interrupts)
> > I haven't investigated that yet.
> > It looks like another new test that my hardware fails to perform... 
> 
> Yes, this is probably caused by SMM code trying to emulate a PS/2
> keyboard from a (maybe connected or not) USB keyboard. Unfortunately we
> have no way to disable this BIOS misfeature in the early boot process. 
> Arjan, Len ?

Nope.  By definition, SMM is invisible to the OS -- we don't even
get a bit that said it occurred (though we'd like one -- it would
be really helpful to diagnose issues like this one)

So go into BIOS SETUP and see if there is a USB Legacy Emulation
feature that you can disable.  Sometimes there is not, but disabling
onboard USB altogether may help at least prove the issue is in that area.

> I built in this test to rule out bogus LAPIC timer calibration values
> which are sometimes off by factor 2-10.
> 
> But I also built in a calibration against the PM-Timer, which turned out
> to be quite reliable and I think the additional verification step is
> only necessary for sytems without PM-Timer.
> 
> That was a bit over cautious from my side. I send a patch to avoid this
> when PM-Timer is available in a separate mail.

PM-Timer was invented to work-around the issue that the TSC became unreliable
in the face of power management on laptops.  In particular, to be able
to time duration of OS idle where TSC stopped.

While it is not fine grain, and it is not low-latency, is should
be very reliable.  My understanding is that it is implemented as
a simple divider right off the system 14MHz clock -- the signal
which most motherboard clocks are PLL multiplied up from --
including the 100MHz front-side bus which drives the LAPIC timer.

But that said, I don't understand why calibrating the LAPIC timer
using the PM-timer is going to be more reliable -- exactly how
and why did the previous calibration scheme fail?
Maybe I could follow the new logic in apic.c if I saw the "apic=debug"
output for this box.

cheers,
-Len


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] kprobes: fix sparse NULL warning

2007-03-16 Thread Randy Dunlap
From: Randy Dunlap <[EMAIL PROTECTED]>

Fix sparse NULL warnings:
kernel/kprobes.c:915:49: warning: Using plain integer as NULL pointer

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---
 kernel/kprobes.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- linux-2621-rc4.orig/kernel/kprobes.c
+++ linux-2621-rc4/kernel/kprobes.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -912,7 +913,7 @@ static int __kprobes debugfs_kprobe_init
if (!dir)
return -ENOMEM;
 
-   file = debugfs_create_file("list", 0444, dir , 0 ,
+   file = debugfs_create_file("list", 0444, dir, NULL,
&debugfs_kprobes_operations);
if (!file) {
debugfs_remove(dir);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New kernel mouse recognition problem

2007-03-16 Thread Dmitry Torokhov
Hi Victor,

On Friday 16 March 2007 17:33, Victor Fernandes wrote:
> Dear kernel gurus,
> 
> I have a long experience with linux but not at the kernel level, so my
> apologies if this post is not appropriate for the list, but it seemed to
> me to be the only possible one to post my question.
> 
> Obviously I've also tried to find the solution on the archives (and more)
> but found nothing appropriate.
> 
> Problem: It appears that the new kernels, I've actually kernel 2.6.17-5
> (Mandriva 2007) installed, and tested others on the 2.6.x range, do not
> recognition my "Track Point" mouse anymore. I have on the boot logs
> (syslog) the following message: "logips2pp: Detected unknown logitech
> mouse model 0".
> 

Does the mouse still work despite this message?

> The same system with a kernel 2.6.12-12 (Mandriva 10.1) worked properly.
> 

How was it identified by 2.6.12? Could you please send me output of
cat /proc/bus/input/devices on 2.6.12 and 2.6.17?

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA problems in 2.6.20.3

2007-03-16 Thread Alistair John Strachan
On Friday 16 March 2007 23:44, you wrote:
> Charles Shannon Hendrix wrote:
> > I normally run a modified 2.6.19 kernel and it works great.
> >
> > I recently tried 2.6.20 and had severe SATA problems with it.
> >
> > Yesterday I tried 2.6.20.3, and the problems are still there.
>
> Can you try 2.6.21-rc and see if the problem is fixed in those kernels?

-rc4 specifically, it's the first one that's worked for me (possibly related).

(BTW Robert, the sata_nv shadow registers patch has been fine here with a 
patched -rc3 for just over a week now.)

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] A need for "yesno"-function? (and "cleanup" of kernel.h)

2007-03-16 Thread Richard Knutsson

Jan Engelhardt wrote:

On Mar 16 2007 16:24, Richard Knutsson wrote:
  

char yesno_chr(const bool value)
{
   return "ny"[value];
}

char *yesno_str(const bool value)
{
   return &"no\0yes"[3 * value];
}



static/extern const char *const yesno[] = {"no", "yes"};
static inline const char *yesno_str(bool value)
  

Should we use "inline"? Isn't it better to leave that to the compiler?
Why the "const"?

{
return yesno[value];
}
  

That's better :)
But I think a simple

static char *yesno_str(bool value)
{
return value ? "yes" : "no";
}
is to prefer, don't you? It is simpler and we don't need to deal with an 
unnecessary array (unless it may be used by itself, that is. Then I would go 
for your implementation).


#or
#define yesno_str(value) yesno[!!(value)]
  
Why not "(bool)value" instead? We cast all the other times we want a 
something to be of a different kind.


Any thoughts where to put a function like this?

Richard Knutsson


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] replace get_scheduled_cycles with sched_clock paravirt_op

2007-03-16 Thread Jeremy Fitzhardinge
Andrew Morton wrote:
> On Wed, 14 Mar 2007 12:07:14 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> 
> wrote:
>
>   
>> Subject: Add a sched_clock paravirt_op
>>
>> The tsc-based get_scheduled_cycles interface is not a good match for
>> Xen's runstate accounting, which reports everything in nanoseconds.
>>
>> This patch replaces this interface with a sched_clock interface, which
>> matches both Xen and VMI's requirements.
>>
>> In order to do this, we:
>>1. replace get_scheduled_cycles with sched_clock
>>2. hoist cycles_2_ns into a common header
>>3. update vmi accordingly
>>
>> One thing to note: because sched_clock is implemented as a weak function in
>> kernel/sched.c, we must define a real function in order to override this weak
>> binding.  This means the usual paravirt_ops technique of using an inline
>> function won't work in this case.
>> 
>
> include/asm/paravirt.h: In function 'paravirt_sched_clock':
> include/asm/paravirt.h:281: warning: implicit declaration of function 
> 'PVOP_CALL0'
> include/asm/paravirt.h:281: error: expected expression before 'unsigned'
>   

Sorry, I didn't intend for it to be picked up; it depends on stuff
earlier in the patch series.  I just wanted to check with the VMI folks
that it works for them in principle.

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-16 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
> I like this code very much; although it is unavoidably ugly, it is a
> nice general mechanism for doing code rewriting.  Much more
> elaboration on this below.
>

Thanks.

> static inline void local_irq_restore(const unsigned long flags)
> {
>vmi_wrap_call(
>SetInterruptMask, "pushl %0; popfl",
>VMI_NO_OUTPUT,
>1, VMI_IREG1 (flags),
>XCONC("cc", "memory"));
> }
>
> So the constraints are obvious and tied to the inline assembly.  But
> Jeremy seems to have done even better with the vcall stuff.  Prettier:
>
> +PVOP_VCALL0(setup_boot_clock);

Yeah, it doesn't try as hard as your example, so its all based around
the function call ABI.  If you want to inline something, you need to do
that elsewhere, which I guess is OK because that's not the
common case (only very simple cases can be replaced by inlines, and only
a few of those are worth doing).

> We went through this design exercise, and thought it was pretty
> promising.  Basically, you would reserve a set of "local" relocation
> types that should never be emitted by the toolchain.  Then you can
> have complex relocations, such as "replace pushf; popf %0 with
> arbitrary code."  You can even leave the arguments unfixed and grant
> the compiler register allocation, as long as you took care to encode
> the input / output registers somewhere (in a .reloc section of some
> sort, or encoded in the relocation type itself).

I'm pretty sure that's not what he means.  The big objection to the
PVOP_* stuff is the fact that there are these massive macros full of
inline asm to wrap the calls, which have to be invoked in a fragile
type-unsafe way.  Adding custom relocs would suffer the same problem,
since you'd need inline asm to deal with them, and I'm deathly
frightened of whatever binutils would do if you mean real relocs.

I think the suggestion is much simpler.  If you convince gcc/binutils to
leave the .reloc section in vmlinux, and make that available to the
kernel itself, then you can scan all the kernel's relocs to find ones
which refer to paravirt_ops, and use those to determine which are
callsites that can be patched.

The main upside is that all the callsites are just normal C calls;
there's no special syntax or strange macros, and we get the full benefit
of typechecking, etc.

But I can see a few downsides compared the current scheme:

   1. Identifying the callsites is a somewhat hackish process of looking
  at a reloc and doing a bit of dissassembly to see what is using
  the reloc, to identify calls and jumps
   2. There's nothing explicit to tell us how much space there is to
  patch into; we just have to assume sizeof(indirect call/jmp)
   3. There's no information about the register environment at the
  callsite, so we just have to adopt normal C ABI rules.  For the
  patch sites in hand-written asm, this could be tricky.
   4. gcc could do strange things which prevent detection of patch
  sites.  For example, it might CSE the value of, say,
  paravirt_ops.irq_enable, which would be a reasonable optimisation,
  but prevent any of the resulting indirect calls from being
  patched.  In general it relies on gcc to generate identifiable
  callsites, which is a bit unpredictable.
   5. There's still a moderate amount of binutils hackery to get the
  relocs into the right form, and there's plenty of scope for it to
  screw up.


> [ Roswell technology deleted ]

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] usb/serial/io_edgeport: Convert to generic boolean

2007-03-16 Thread Richard Knutsson
Signed-off-by: Richard Knutsson <[EMAIL PROTECTED]>
---
Compile-tested with "allyes", "allmod" & "allno" on i386


diff --git a/drivers/usb/serial/io_edgeport.c b/drivers/usb/serial/io_edgeport.c
index 6a26a2e..aee0b24 100644
--- a/drivers/usb/serial/io_edgeport.c
+++ b/drivers/usb/serial/io_edgeport.c
@@ -111,7 +111,7 @@ struct edgeport_port {
 
struct TxFifo   txfifo; /* transmit fifo -- 
size will be maxTxCredits */
struct urb  *write_urb; /* write URB for this 
port */
-   charwrite_in_progress;  /* TRUE while a write 
URB is outstanding */
+   boolwrite_in_progress;  /* 'true' while a write 
URB is outstanding */
spinlock_t  ep_lock;
 
__u8shadowLCR;  /* last LCR value 
received */
@@ -123,11 +123,11 @@ struct edgeport_port {
__u8validDataMask;
__u32   baudRate;
 
-   charopen;
-   charopenPending;
-   charcommandPending;
-   charclosePending;
-   charchaseResponsePending;
+   boolopen;
+   boolopenPending;
+   boolcommandPending;
+   boolclosePending;
+   boolchaseResponsePending;
 
wait_queue_head_t   wait_chase; /* for handling 
sleeping while waiting for chase to finish */
wait_queue_head_t   wait_open;  /* for handling 
sleeping while waiting for open to finish */
@@ -156,7 +156,7 @@ struct edgeport_serial {
__u8bulk_in_endpoint;   /* the bulk in 
endpoint handle */
unsigned char * bulk_in_buffer; /* the buffer 
we use for the bulk in endpoint */
struct urb *read_urb;   /* our bulk 
read urb */
-   int read_in_progress;
+   boolread_in_progress;
spinlock_t  es_lock;
 
__u8bulk_out_endpoint;  /* the bulk out 
endpoint handle */
@@ -631,14 +631,14 @@ static void edge_interrupt_callback (struct urb *urb)
if (edge_serial->rxBytesAvail > 0 &&
!edge_serial->read_in_progress) {
dbg("%s - posting a read", 
__FUNCTION__);
-   edge_serial->read_in_progress = TRUE;
+   edge_serial->read_in_progress = true;
 
/* we have pending bytes on the bulk in 
pipe, send a request */
edge_serial->read_urb->dev = 
edge_serial->serial->dev;
result = 
usb_submit_urb(edge_serial->read_urb, GFP_ATOMIC);
if (result) {

dev_err(&edge_serial->serial->dev->dev, "%s - usb_submit_urb(read bulk) failed 
with result = %d\n", __FUNCTION__, result);
-   edge_serial->read_in_progress = 
FALSE;
+   edge_serial->read_in_progress = 
false;
}
}
spin_unlock(&edge_serial->es_lock);
@@ -695,13 +695,13 @@ static void edge_bulk_in_callback (struct urb *urb)
 
if (urb->status) {
dbg("%s - nonzero read bulk status received: %d", __FUNCTION__, 
urb->status);
-   edge_serial->read_in_progress = FALSE;
+   edge_serial->read_in_progress = false;
return;
}
 
if (urb->actual_length == 0) {
dbg("%s - read bulk callback with no data", __FUNCTION__);
-   edge_serial->read_in_progress = FALSE;
+   edge_serial->read_in_progress = false;
return;
}
 
@@ -725,10 +725,10 @@ static void edge_bulk_in_callback (struct urb *urb)
status = usb_submit_urb(edge_serial->read_urb, GFP_ATOMIC);
if (status) {
dev_err(&urb->dev->dev, "%s - usb_submit_urb(read bulk) 
failed, status = %d\n", __FUNCTION__, status);
-   edge_serial->read_in_progress = FALSE;
+   edge_serial->read_in_progress = false;
}
} else {
-   edge_serial->read_in_progress = FALSE;
+   edge_serial->read_in_progress = false;
}
 
spin_unlock(&edge_serial->es_lock);
@@ -759,7 +759,7 @@ static void edge_bulk_out_data_callback (struct urb *urb)
}
 
// Release the Write URB
-   edge_port->write_in_p

Re: [PATCH 2/2] xfs: stop using kmalloc in xfs_buf_get_noaddr

2007-03-16 Thread Christoph Hellwig
> It looks like you might need: for (i--; i >= 0; i--)
> (or: for (j = 0; j < i; j++) etc.)
> 
> Because if the initial alloc_page loop goes to completion then:
>  i == pagecount
> and if alloc_page loop terminates early then
>  bp->b_pages[i] == NULL
> So we have gone 1 too far in both cases and need to
> start free'ing back one.
> Unless I missed something.

No, I was missing something :)

Here's the updated version:


Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c
===
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c   2007-03-16 15:32:20.0 
+0100
+++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c2007-03-16 15:35:10.0 
+0100
@@ -314,7 +314,7 @@ xfs_buf_free(
 
ASSERT(list_empty(&bp->b_hash_list));
 
-   if (bp->b_flags & _XBF_PAGE_CACHE) {
+   if (bp->b_flags & (_XBF_PAGE_CACHE|_XBF_PAGES)) {
uinti;
 
if ((bp->b_flags & XBF_MAPPED) && (bp->b_page_count > 1))
@@ -323,18 +323,11 @@ xfs_buf_free(
for (i = 0; i < bp->b_page_count; i++) {
struct page *page = bp->b_pages[i];
 
-   ASSERT(!PagePrivate(page));
+   if (bp->b_flags & _XBF_PAGE_CACHE)
+   ASSERT(!PagePrivate(page));
page_cache_release(page);
}
_xfs_buf_free_pages(bp);
-   } else if (bp->b_flags & _XBF_KMEM_ALLOC) {
-/*
- * XXX(hch): bp->b_count_desired might be incorrect (see
- * xfs_buf_associate_memory for details), but fortunately
- * the Linux version of kmem_free ignores the len argument..
- */
-   kmem_free(bp->b_addr, bp->b_count_desired);
-   _xfs_buf_free_pages(bp);
}
 
xfs_buf_deallocate(bp);
@@ -764,41 +757,41 @@ xfs_buf_get_noaddr(
size_t  len,
xfs_buftarg_t   *target)
 {
-   size_t  malloc_len = len;
+   unsigned long   page_count = PAGE_ALIGN(len) >> PAGE_SHIFT;
+   int error, i;
xfs_buf_t   *bp;
-   void*data;
-   int error;
 
bp = xfs_buf_allocate(0);
if (unlikely(bp == NULL))
goto fail;
_xfs_buf_initialize(bp, target, 0, len, 0);
 
- try_again:
-   data = kmem_alloc(malloc_len, KM_SLEEP | KM_MAYFAIL | KM_LARGE);
-   if (unlikely(data == NULL))
+   error = _xfs_buf_get_pages(bp, page_count, 0);
+   if (error)
goto fail_free_buf;
 
-   /* check whether alignment matches.. */
-   if ((__psunsigned_t)data !=
-   ((__psunsigned_t)data & ~target->bt_smask)) {
-   /* .. else double the size and try again */
-   kmem_free(data, malloc_len);
-   malloc_len <<= 1;
-   goto try_again;
-   }
-
-   error = xfs_buf_associate_memory(bp, data, len);
-   if (error)
+   for (i = 0; i < page_count; i++) {
+   bp->b_pages[i] = alloc_page(GFP_KERNEL);
+   if (!bp->b_pages[i])
+   goto fail_free_mem;
+   }
+   bp->b_flags |= _XBF_PAGES;
+
+   error = _xfs_buf_map_pages(bp, XBF_MAPPED);
+   if (unlikely(error)) {
+   printk(KERN_WARNING "%s: failed to map pages\n",
+   __FUNCTION__);
goto fail_free_mem;
-   bp->b_flags |= _XBF_KMEM_ALLOC;
+   }
 
xfs_buf_unlock(bp);
 
XB_TRACE(bp, "no_daddr", data);
return bp;
+
  fail_free_mem:
-   kmem_free(data, malloc_len);
+   while (--i >= 0)
+   __free_page(bp->b_pages[i]);
  fail_free_buf:
xfs_buf_free(bp);
  fail:
Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.h
===
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.h   2007-03-13 18:18:05.0 
+0100
+++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.h2007-03-16 15:34:20.0 
+0100
@@ -63,7 +63,7 @@ typedef enum {
 
/* flags used only internally */
_XBF_PAGE_CACHE = (1 << 17),/* backed by pagecache */
-   _XBF_KMEM_ALLOC = (1 << 18),/* backed by kmem_alloc()  */
+   _XBF_PAGES = (1 << 18), /* backed by refcounted pages  */
_XBF_RUN_QUEUES = (1 << 19),/* run block device task queue */
_XBF_DELWRI_Q = (1 << 21),   /* buffer on delwri queue */
 } xfs_buf_flags_t;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] usb/serial/whiteheat: Convert to generic boolean

2007-03-16 Thread Richard Knutsson
Signed-off-by: Richard Knutsson <[EMAIL PROTECTED]>
---
Compile-tested with "allyes", "allmod" & "allno" on i386


diff --git a/drivers/usb/serial/whiteheat.c b/drivers/usb/serial/whiteheat.c
index bf16e9e..27c5f8f 100644
--- a/drivers/usb/serial/whiteheat.c
+++ b/drivers/usb/serial/whiteheat.c
@@ -1109,7 +1109,7 @@ static int firm_send_command (struct usb_serial_port 
*port, __u8 command, __u8 *
command_port = port->serial->port[COMMAND_PORT];
command_info = usb_get_serial_port_data(command_port);
spin_lock_irqsave(&command_info->lock, flags);
-   command_info->command_finished = FALSE;
+   command_info->command_finished = false;

transfer_buffer = (__u8 *)command_port->write_urb->transfer_buffer;
transfer_buffer[0] = command;
@@ -1124,12 +1124,12 @@ static int firm_send_command (struct usb_serial_port 
*port, __u8 command, __u8 *
spin_unlock_irqrestore(&command_info->lock, flags);
 
/* wait for the command to complete */
-   wait_event_interruptible_timeout(command_info->wait_command, 
-   (command_info->command_finished != FALSE), COMMAND_TIMEOUT);
+   wait_event_interruptible_timeout(command_info->wait_command,
+   (bool)command_info->command_finished, COMMAND_TIMEOUT);
 
spin_lock_irqsave(&command_info->lock, flags);
 
-   if (command_info->command_finished == FALSE) {
+   if (command_info->command_finished == false) {
dbg("%s - command timed out.", __FUNCTION__);
retval = -ETIMEDOUT;
goto exit;
diff --git a/drivers/usb/serial/whiteheat.h b/drivers/usb/serial/whiteheat.h
index d714eff..f160797 100644
--- a/drivers/usb/serial/whiteheat.h
+++ b/drivers/usb/serial/whiteheat.h
@@ -20,10 +20,6 @@
 #define __LINUX_USB_SERIAL_WHITEHEAT_H
 
 
-#define FALSE  0
-#define TRUE   1
-
-
 /* WhiteHEAT commands */
 #define WHITEHEAT_OPEN 1   /* open the port */
 #define WHITEHEAT_CLOSE2   /* close the port */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] xfs: use xfs_get_buf_noaddr for iclogs

2007-03-16 Thread Christoph Hellwig
On Mon, Mar 12, 2007 at 03:41:17PM +1100, David Chinner wrote:
> OTOH, all other buffers are supposed to be locked when under I/O.
> This change makes a special case for the log buffers, and I'd prefer
> not to have to remember that this behaviour changed fo log buffers
> at some point in time.
> 
> I suggest that adding:

...

> + XFS_BUF_PSEMA(bp, PRIBIO);

...

> To lock the buffer should be added here. That way we don't change
> any semantics of the code at all.

Here's a patch with your suggestion implemented.  Seems to work
fine under heavy NFS load for me.  Note that the log recovery has
some inconsistancies already about doing I/O both on locked and
unlocked buffers.  Long-term it might be a good idea to change
xfs_get_buf_noaddr to return a locked buffer like xfs_get_buf(_flags)
does already.


Index: linux-2.6/fs/xfs/xfs_log.c
===
--- linux-2.6.orig/fs/xfs/xfs_log.c 2007-03-16 15:21:43.0 +0100
+++ linux-2.6/fs/xfs/xfs_log.c  2007-03-16 15:34:15.0 +0100
@@ -1199,11 +1199,18 @@ xlog_alloc_log(xfs_mount_t  *mp,
*iclogp = (xlog_in_core_t *)
  kmem_zalloc(sizeof(xlog_in_core_t), KM_SLEEP);
iclog = *iclogp;
-   iclog->hic_data = (xlog_in_core_2_t *)
- kmem_zalloc(iclogsize, KM_SLEEP | KM_LARGE);
-
iclog->ic_prev = prev_iclog;
prev_iclog = iclog;
+
+   bp = xfs_buf_get_noaddr(log->l_iclog_size, mp->m_logdev_targp);
+   if (!XFS_BUF_CPSEMA(bp))
+   ASSERT(0);
+   XFS_BUF_SET_IODONE_FUNC(bp, xlog_iodone);
+   XFS_BUF_SET_BDSTRAT_FUNC(bp, xlog_bdstrat_cb);
+   XFS_BUF_SET_FSPRIVATE2(bp, (unsigned long)1);
+   iclog->ic_bp = bp;
+   iclog->hic_data = bp->b_addr;
+
log->l_iclog_bak[i] = (xfs_caddr_t)&(iclog->ic_header);
 
head = &iclog->ic_header;
@@ -1216,11 +1223,6 @@ xlog_alloc_log(xfs_mount_t   *mp,
INT_SET(head->h_fmt, ARCH_CONVERT, XLOG_FMT);
memcpy(&head->h_fs_uuid, &mp->m_sb.sb_uuid, sizeof(uuid_t));
 
-   bp = xfs_buf_get_empty(log->l_iclog_size, mp->m_logdev_targp);
-   XFS_BUF_SET_IODONE_FUNC(bp, xlog_iodone);
-   XFS_BUF_SET_BDSTRAT_FUNC(bp, xlog_bdstrat_cb);
-   XFS_BUF_SET_FSPRIVATE2(bp, (unsigned long)1);
-   iclog->ic_bp = bp;
 
iclog->ic_size = XFS_BUF_SIZE(bp) - log->l_iclog_hsize;
iclog->ic_state = XLOG_STATE_ACTIVE;
@@ -1528,7 +1530,6 @@ xlog_dealloc_log(xlog_t *log)
}
 #endif
next_iclog = iclog->ic_next;
-   kmem_free(iclog->hic_data, log->l_iclog_size);
kmem_free(iclog, sizeof(xlog_in_core_t));
iclog = next_iclog;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] replace get_scheduled_cycles with sched_clock paravirt_op

2007-03-16 Thread Andrew Morton
On Wed, 14 Mar 2007 12:07:14 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> 
wrote:

> Subject: Add a sched_clock paravirt_op
> 
> The tsc-based get_scheduled_cycles interface is not a good match for
> Xen's runstate accounting, which reports everything in nanoseconds.
> 
> This patch replaces this interface with a sched_clock interface, which
> matches both Xen and VMI's requirements.
> 
> In order to do this, we:
>1. replace get_scheduled_cycles with sched_clock
>2. hoist cycles_2_ns into a common header
>3. update vmi accordingly
> 
> One thing to note: because sched_clock is implemented as a weak function in
> kernel/sched.c, we must define a real function in order to override this weak
> binding.  This means the usual paravirt_ops technique of using an inline
> function won't work in this case.

include/asm/paravirt.h: In function 'paravirt_sched_clock':
include/asm/paravirt.h:281: warning: implicit declaration of function 
'PVOP_CALL0'
include/asm/paravirt.h:281: error: expected expression before 'unsigned'
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc3-mm2 hangs my opteron during bootup, ACPI?

2007-03-16 Thread Helge Hafting

Len Brown wrote:

On Monday 12 March 2007 09:25, Luming Yu wrote:
  

try acpi=off please.


Ok, it boots up fine with acpi=off.
Now the next step is to try without the mm patch?

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] i386: trust the PM-Timer calibration of the local APIC timer

2007-03-16 Thread Thomas Gleixner
When PM-Timer is available for local APIC timer calibration we can skip
the verification of the calibrated time value. The resulting error is
quite small on a bunch of evaluated platforms and is less harming than
the observed false positives.

We need to keep the verification on systems, which have no PM-Timer to
avoid bogus local APIC timer calibrations in the range of factor 2-10,
which can be observed when swicthing off the PM-timer support in the
kernel configuration.

The wrong calibration values are probably caused by SMM code trying to
emulate a PS/2 keyboard from a (maybe connected or not) USB keyboard.
This prohibits the accurate delivery of PIT interrupts, which are used
to calibrate the local APIC timer. Unfortunately we have no way to
disable this BIOS misfeature in the early boot process.

Add also the dropped cpu_relax() back to the wait loops.

Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/apic.c b/arch/i386/kernel/apic.c
index 2383bcf..92f4210 100644
--- a/arch/i386/kernel/apic.c
+++ b/arch/i386/kernel/apic.c
@@ -338,6 +338,7 @@ void __init setup_boot_APIC_clock(void)
void (*real_handler)(struct clock_event_device *dev);
unsigned long deltaj;
long delta, deltapm;
+   int pm_referenced = 0;
 
apic_printk(APIC_VERBOSE, "Using local APIC timer interrupts.\n"
"calibrating APIC timer ...\n");
@@ -357,7 +358,8 @@ void __init setup_boot_APIC_clock(void)
/* Let the interrupts run */
local_irq_enable();
 
-   while(lapic_cal_loops <= LAPIC_CAL_LOOPS);
+   while(lapic_cal_loops <= LAPIC_CAL_LOOPS)
+   cpu_relax();
 
local_irq_disable();
 
@@ -394,6 +396,7 @@ void __init setup_boot_APIC_clock(void)
   "%lu (%ld)\n", (unsigned long) res, delta);
delta = (long) res;
}
+   pm_referenced = 1;
}
 
/* Calculate the scaled math multiplication factor */
@@ -423,68 +426,41 @@ void __init setup_boot_APIC_clock(void)
calibration_result / (100 / HZ),
calibration_result % (100 / HZ));
 
-
-   apic_printk(APIC_VERBOSE, "... verify APIC timer\n");
-
-   /*
-* Setup the apic timer manually
-*/
local_apic_timer_verify_ok = 1;
-   levt->event_handler = lapic_cal_handler;
-   lapic_timer_setup(CLOCK_EVT_MODE_PERIODIC, levt);
-   lapic_cal_loops = -1;
 
-   /* Let the interrupts run */
-   local_irq_enable();
+   /* We trust the pm timer based calibration */
+   if (!pm_referenced) {
+   apic_printk(APIC_VERBOSE, "... verify APIC timer\n");
 
-   while(lapic_cal_loops <= LAPIC_CAL_LOOPS);
+   /*
+* Setup the apic timer manually
+*/
+   levt->event_handler = lapic_cal_handler;
+   lapic_timer_setup(CLOCK_EVT_MODE_PERIODIC, levt);
+   lapic_cal_loops = -1;
 
-   local_irq_disable();
+   /* Let the interrupts run */
+   local_irq_enable();
 
-   /* Stop the lapic timer */
-   lapic_timer_setup(CLOCK_EVT_MODE_SHUTDOWN, levt);
+   while(lapic_cal_loops <= LAPIC_CAL_LOOPS)
+   cpu_relax();
 
-   local_irq_enable();
+   local_irq_disable();
 
-   /* Jiffies delta */
-   deltaj = lapic_cal_j2 - lapic_cal_j1;
-   apic_printk(APIC_VERBOSE, "... jiffies delta = %lu\n", deltaj);
+   /* Stop the lapic timer */
+   lapic_timer_setup(CLOCK_EVT_MODE_SHUTDOWN, levt);
 
-   /* Check, if the PM timer is available */
-   deltapm = lapic_cal_pm2 - lapic_cal_pm1;
-   apic_printk(APIC_VERBOSE, "... PM timer delta = %ld\n", deltapm);
+   local_irq_enable();
 
-   local_apic_timer_verify_ok = 0;
+   /* Jiffies delta */
+   deltaj = lapic_cal_j2 - lapic_cal_j1;
+   apic_printk(APIC_VERBOSE, "... jiffies delta = %lu\n", deltaj);
 
-   if (deltapm) {
-   if (deltapm > (pm_100ms - pm_thresh) &&
-   deltapm < (pm_100ms + pm_thresh)) {
-   apic_printk(APIC_VERBOSE, "... PM timer result ok\n");
-   /* Check, if the jiffies result is consistent */
-   if (deltaj < LAPIC_CAL_LOOPS-2 ||
-   deltaj > LAPIC_CAL_LOOPS+2) {
-   /*
-* Not sure, what we can do about this one.
-* When high resultion timers are active
-* and the lapic timer does not stop in C3
-* we are fine. Otherwise more trouble might
-* be waiting. -- tglx
-*/
-   printk(KERN_WARNING "Global event device %s "
- 

SATA: Marvell 88SE6141 in Asus P5WDG2-WS not recognized

2007-03-16 Thread Tony Fogle

Hello everyone;

I saw some talk on this mailing list about 9 months ago about the Marvell 
88SE6141 Sata II not being recognized.


I just went from FC5 to a new Fedora Core 6 install, and even updated to 
the latest kernel 2.6.20-1.2925 on an x86_64.


The kernel is not seeing my 2nd set of 4 Sata II ports provided by the 
Marvell chipset.  It doesn't matter if I enable the raid management boot 
prom option or not.


I'm having no problems with the 4 Intel Sata II ports.

Any help would be appreciated.

Thanks,
Tony

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Linux 2.4.x MTD CFI P30/P33 support

2007-03-16 Thread Korolev, Alexey
Hello Willy,

The following patch adds support of P30 and P33 NOR FLASH support in
Linux 2.4
This flash is substitution of J3 flash which is widely used it on Linux
2.4 kernels. Currently many customers wishing to substitute J3 for P3x
face issues on Linux 2.4. This patch resolves issues with P3x on all
generic Linux kernels versions since 2.4.21.

The patch just allows using minor version "4" in CFI driver. Since
differences between CFI minor version "3" and minor version "4" are
small the CFI driver is not affected by this. Patch has been verified on
Mainstone (PXA27x based) platform.

Signed-off-by: Alexey Korolev <[EMAIL PROTECTED]>
==
--- a/drivers/mtd/chips/cfi_cmdset_0001.c   2003-06-13
18:51:34.0 +0400
+++ b/drivers/mtd/chips/cfi_cmdset_0001.c   2007-02-16
21:39:50.0 +0300
@@ -152,7 +152,7 @@
}

if (extp->MajorVersion != '1' || 
-   (extp->MinorVersion < '0' || extp->MinorVersion >
'3')) {
+   (extp->MinorVersion < '0' || extp->MinorVersion >
'4')) {
printk(KERN_WARNING "  Unknown IntelExt Extended
Query "
   "version %c.%c.\n",  extp->MajorVersion,
   extp->MinorVersion);
===

Thanks,
Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA problems in 2.6.20.3

2007-03-16 Thread Robert Hancock

Charles Shannon Hendrix wrote:

I normally run a modified 2.6.19 kernel and it works great.

I recently tried 2.6.20 and had severe SATA problems with it.

Yesterday I tried 2.6.20.3, and the problems are still there.


Can you try 2.6.21-rc and see if the problem is fixed in those kernels?

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far

2007-03-16 Thread Thomas Gleixner
Maxim,

On Fri, 2007-03-16 at 12:30 +0200, Maxim Levitsky wrote:
> 3) Sometimes I get this (once in three boots or so)
> 
> [   36.217405] ENABLING IO-APIC IRQs
> [   36.217587] ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
> [   36.433917] APIC timer disabled due to verification failure.
> 
> And NO_HZ is disabled due to that (I get 1000/s timer's interrupts)
> I haven't investigated that yet.
> It looks like another new test that my hardware fails to perform... 

Yes, this is probably caused by SMM code trying to emulate a PS/2
keyboard from a (maybe connected or not) USB keyboard. Unfortunately we
have no way to disable this BIOS misfeature in the early boot process. 
Arjan, Len ?

I built in this test to rule out bogus LAPIC timer calibration values
which are sometimes off by factor 2-10.

But I also built in a calibration against the PM-Timer, which turned out
to be quite reliable and I think the additional verification step is
only necessary for sytems without PM-Timer.

That was a bit over cautious from my side. I send a patch to avoid this
when PM-Timer is available in a separate mail.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21-rc4

2007-03-16 Thread Michal Piotrowski

On 17/03/07, Jan Engelhardt <[EMAIL PROTECTED]> wrote:


On Mar 16 2007 19:55, Michal Piotrowski wrote:
>> > I've got *bad* news. Bug described here
>> > http: //www.ussg.iu.edu/hypermail/linux/kernel/0703.0/index.html#0889
>> > http: //www.ussg.iu.edu/hypermail/linux/kernel/0703.0/index.html#1165
>> > probably leaked into mainline.
>> >
>> > Fsck!
>>
>> fsck indeed.  I don't even understand what's happening with that one - it
>> seems like the kernel schedules a user process, but never deschedules it
>> again.
>>
>
> Tomorrow, I'll try to find out how to reproduce this bug.

From #0889:

 "I have noticed some strange system behavior. When i try to build a
 kernel (medium load) - X, keyboard, mouse and sound hangs."

Note that ping is handled in interrupt or softirq context. So something has
locked up. Try without X? Or perhaps attack a serial console/netconsole, and
when it hangs, use Sysrq to dump the process' states.


I already did this
http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log




Jan
--



Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far

2007-03-16 Thread Thomas Gleixner
On Fri, 2007-03-16 at 12:30 +0200, Maxim Levitsky wrote:
> Mar 14 00:22:23 MAIN kernel: [2.072875] caller is 
> check_tsc_sync_source+0x1d/0x100
> Mar 14 00:22:23 MAIN kernel: [2.072878]  [show_trace_log_lvl+26/48] 
> show_trace_log_lvl+0x1a/0x30
> Mar 14 00:22:23 MAIN kernel: [2.072931] checking TSC synchronization 
> [CPU#0 -> CPU#1]:
> Mar 14 00:22:23 MAIN kernel: [2.092922] Measured 72051818872 cycles TSC 
> warp between CPUs, turning off
> 
> It looks clear that preempt is enabled all the way in second cpu 
> initialization, ( I think that at least in check_tsc_sync_source, it should 
> be disabled,
> shouldn't it ? )

This should be fixed by commit d04f41e35343f1d788551fd3f753f51794f4afcf

tglx



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21-rc4

2007-03-16 Thread Jan Engelhardt

On Mar 16 2007 17:13, Chris Friesen wrote:
>
> This would seem to be a bug in the build system then.  Or are you
> supposed to "make clean" after every config change?

No. When .config is changed, include/linux/config/ is updated, which
causes things that depends on it one or the other way to rebuild. At
least that is what I observed since ages.


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-16 Thread Zachary Amsden

Jeremy Fitzhardinge wrote:

Well, one thing to make clear is this is absolutely not a Xen-specific
patch or piece of code.  This is part of the paravirt_ops infrastructure
designed to remove the overhead of all the indirect calls which are
scattered all over the place.  (Perhaps I should post the general
paravirt and Xen specific patches in separate patch series to make this
clear...).

The idea is to wrap the callsite itself with in the same manner as the
other altinstructions so that the general patcher can, at the very
least, convert the indirect call to a direct one, or nop it out if its
an indirect call.  This means that a pv_ops implementation can get about
90% of the benefit of patching without any extra effort.
  


I like this code very much; although it is unavoidably ugly, it is a 
nice general mechanism for doing code rewriting.  Much more elaboration 
on this below.



So, I disagree with your characterisation that its "limited"; this is a
pretty general mechanism.  The fragile part is in using the PVOP_*
macros properly to match the ABI's calling convention, particularly with
tricky cases like passed and returned structures and 64-bit args.  But
that just needs to be done once in one place, and is otherwise
self-contained.
  


You could just use the VMI ABI, then patch everything at runtime to call 
into the Xen paravirt-ops backend ;)



I would love a better mechanism.  I played with using things like gcc's
builtin_apply stuff, and so on, but I could find no way to get gcc to
set up the args and then be able to just emit the call itself under asm
control.
  


I fought tooth and nail to get something cleaner than this for VMI back 
when it was a subarch.  In the end, the best I could do was wrap the 
constraints into prettier macros so the asm volatile stuff wasn't 
sticking out everywhere.  It was pretty, but the macros were so 
grotesque that I was exiled from my home planet.


static inline void local_irq_restore(const unsigned long flags)
{
   vmi_wrap_call(
   SetInterruptMask, "pushl %0; popfl",
   VMI_NO_OUTPUT,
   1, VMI_IREG1 (flags),
   XCONC("cc", "memory"));
}

So the constraints are obvious and tied to the inline assembly.  But 
Jeremy seems to have done even better with the vcall stuff.  Prettier:


+   PVOP_VCALL0(setup_boot_clock);





I haven't looked at Dave's reply in detail, but I saw some mention of
using relocs.  The idea is intriguing , but I don't quite see how it
would all fit together.
  


We went through this design exercise, and thought it was pretty 
promising.  Basically, you would reserve a set of "local" relocation 
types that should never be emitted by the toolchain.  Then you can have 
complex relocations, such as "replace pushf; popf %0 with arbitrary 
code."  You can even leave the arguments unfixed and grant the compiler 
register allocation, as long as you took care to encode the input / 
output registers somewhere (in a .reloc section of some sort, or encoded 
in the relocation type itself).


Now you can make complex decisions at runtime, and apply choice 
functions to these relocations that can cope with a variety of different 
circumstances - you could encode not just paravirt-ops as relocations, 
but all of the alternative instructions, and smp alternatives, and even 
higher level constructs, such as choices made by the user with the 
kernel command line - some potential examples:


acpi=noirq
idle=halt

With proper synchronization, using something like stop_machine_run, you 
can even make these choices dynamically, and then relink the kernel in 
place to take faster paths.  And the technique is universal, so you 
could use it cross architecture, which would be really helpful for 
architectures that say, have really slow indirect branches.


Once the technique gains wide acceptance, you could use it for all 
kernel interfaces which have static function pointers for the post-init 
lifetime of the kernel.  Which might contribute to a global performance 
improvement of perhaps a couple percent.  But the cost is clearly the 
complexity.


I just had a slightly interesting idea - you could even catch bugs where 
dynamic assignments to function pointers fail to update the appropriate 
patch sites by checking for non .init code sections which write through 
accelerated_fn_ptr_t's using static checking from sparse.


Is that sort of what you were thinking of Dave?

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21-rc4

2007-03-16 Thread Jan Engelhardt

On Mar 16 2007 19:55, Michal Piotrowski wrote:
>> > I've got *bad* news. Bug described here
>> > http: //www.ussg.iu.edu/hypermail/linux/kernel/0703.0/index.html#0889
>> > http: //www.ussg.iu.edu/hypermail/linux/kernel/0703.0/index.html#1165
>> > probably leaked into mainline.
>> > 
>> > Fsck!
>> 
>> fsck indeed.  I don't even understand what's happening with that one - it
>> seems like the kernel schedules a user process, but never deschedules it
>> again.
>> 
>
> Tomorrow, I'll try to find out how to reproduce this bug.

>From #0889:

 "I have noticed some strange system behavior. When i try to build a
 kernel (medium load) - X, keyboard, mouse and sound hangs."

Note that ping is handled in interrupt or softirq context. So something has
locked up. Try without X? Or perhaps attack a serial console/netconsole, and
when it hangs, use Sysrq to dump the process' states.


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH] Fixup COMPAT_VDSO to work with CONFIG_PARAVIRT

2007-03-16 Thread Jan Engelhardt

On Mar 15 2007 20:03, Zachary Amsden wrote:
> Well testing that is not so fun.  I installed SUSE Pro 9.0, and strings on
> ld.so contains the magic at_sysinfo assert!  But it doesn't install TLS
> libraries, so I'll have to install them by hand.

9.0 is kinda old. And if you want some TLS libs, install the _i686_ glibc
package (not done by default).


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] A need for "yesno"-function? (and "cleanup" of kernel.h) (was: Re: [KJ] [RFC] A need for a "yesno"-function?)

2007-03-16 Thread Jan Engelhardt

On Mar 16 2007 16:24, Richard Knutsson wrote:
>> > 
>> > char yesno_chr(const bool value)
>> > {
>> >return "ny"[value];
>> > }
>> > 
>> > char *yesno_str(const bool value)
>> > {
>> >return &"no\0yes"[3 * value];
>> > }

static/extern const char *const yesno[] = {"no", "yes"};
static inline const char *yesno_str(bool value)
{
return yesno[value];
}
#or
#define yesno_str(value) yesno[!!(value)]

>> > Thoughts?


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far

2007-03-16 Thread Len Brown
On Friday 16 March 2007 06:30, Maxim Levitsky wrote:
> 
> Good day, 
> 
> I want to report regressions I have with 2.6.21-rc3 kernel.
> I use CONFIG_NO_HZ.

Do any of these issues go away with CONFIG_NO_HZ=n (or boot with nohz=n)
or are they all independent of it?

thanks,
-Len

> 1) Both suspend to disk and suspend to RAM are completely broken:
> On vanilla 2.6.20 suspend to disk works perfectly and suspend to ram works 
> _almost_ perfectly (I will tell about that later).
> On 2.6.21-rc1 and later system hangs even before suspend begins (suspend to 
> disk hangs before image write , and after suspend to ram , 
> some devices are powered down (disk,power leds) , and some and not(fans, 
> power) , and system hangs).
> 
> I did a git-bisect and I found which commit caused that:
>   e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change 
> code ordering in main.c (breaks  S3)
>   ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: 
> Change code ordering in disk.c (breaks swsusp, I don't use it, but I tested 
> it)
> 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: 
> Change code ordering in user.c (breaks uswsusp, that I use)
> 
> I reverted those commits and now system suspends correctly to disk, but 
> suspend to ram showed some more regressions.
> 
> 
> 2) ) After suspend to ram I get this 
> 
> Mar 14 00:22:23 MAIN kernel: [2.072875] caller is 
> check_tsc_sync_source+0x1d/0x100
> Mar 14 00:22:23 MAIN kernel: [2.072878]  [show_trace_log_lvl+26/48] 
> show_trace_log_lvl+0x1a/0x30
> Mar 14 00:22:23 MAIN kernel: [2.072881]  [show_trace+18/32] 
> show_trace+0x12/0x20
> Mar 14 00:22:23 MAIN kernel: [2.072884]  [dump_stack+22/32] 
> dump_stack+0x16/0x20
> Mar 14 00:22:23 MAIN kernel: [2.072887]  [debug_smp_processor_id+173/176] 
> debug_smp_processor_id+0xad/0xb0
> Mar 14 00:22:23 MAIN kernel: [2.072891]  [check_tsc_sync_source+29/256] 
> check_tsc_sync_source+0x1d/0x100
> Mar 14 00:22:23 MAIN kernel: [2.072894]  [__cpu_up+80/384] 
> __cpu_up+0x50/0x180
> Mar 14 00:22:23 MAIN kernel: [2.072897]  [_cpu_up+98/208] 
> _cpu_up+0x62/0xd0
> Mar 14 00:22:23 MAIN kernel: [2.072901]  [cpu_up+46/80] cpu_up+0x2e/0x50
> Mar 14 00:22:23 MAIN kernel: [2.072903]  [enable_nonboot_cpus+110/160] 
> enable_nonboot_cpus+0x6e/0xa0
> Mar 14 00:22:23 MAIN kernel: [2.072906]  [enter_state+326/496] 
> enter_state+0x146/0x1f0
> Mar 14 00:22:23 MAIN kernel: [2.072909]  [state_store+174/192] 
> state_store+0xae/0xc0
> Mar 14 00:22:23 MAIN kernel: [2.072912]  [subsys_attr_store+43/64] 
> subsys_attr_store+0x2b/0x40
> Mar 14 00:22:23 MAIN kernel: [2.072917]  [sysfs_write_file+186/272] 
> sysfs_write_file+0xba/0x110
> Mar 14 00:22:23 MAIN kernel: [2.072920]  [vfs_write+150/352] 
> vfs_write+0x96/0x160
> Mar 14 00:22:23 MAIN kernel: [2.072923]  [sys_write+61/112] 
> sys_write+0x3d/0x70
> Mar 14 00:22:23 MAIN kernel: [2.072926]  [sysenter_past_esp+93/153] 
> sysenter_past_esp+0x5d/0x99
> Mar 14 00:22:23 MAIN kernel: [2.072929]  ===
> Mar 14 00:22:23 MAIN kernel: [2.072931] checking TSC synchronization 
> [CPU#0 -> CPU#1]:
> Mar 14 00:22:23 MAIN kernel: [2.092922] Measured 72051818872 cycles TSC 
> warp between CPUs, turning off
> 
> It looks clear that preempt is enabled all the way in second cpu 
> initialization, ( I think that at least in check_tsc_sync_source, it should 
> be disabled,
> shouldn't it ? )
> 
> Then I did add preempt_disable() / preempt_enable()  to this function , and  
> I still got this:
> 
> Mar 14 00:22:23 MAIN kernel: [2.072931] checking TSC synchronization 
> [CPU#0 -> CPU#1]:
> Mar 14 00:22:23 MAIN kernel: [2.092922] Measured 72051818872 cycles TSC 
> warp between CPUs, turning off
> 
> It happens after second CPU is brought back on-line.
> 
> Now I understand that this is TSC sync problem and I tried to do some tests:
> 
>  I tried to disable/enable second CPU by hand, eg I did number of times,
> 
> echo "0" > /sys/devices/system/cpu/cpu1/online
> echo "1" > /sys/devices/system/cpu/cpu1/online
> 
> and TSC sync was ok.
> 
> Then I disabled 2nd CPU, have suspended system to RAM , resumed it  , and 
> then enabled 2nd CPU and got same error message.
> Then I disabled cpufreq , and did above tests, and got same results.
> I think that maybe this error is false, that there is some difference in TSC 
> clock, but this difference is constant, and can be fixed
> 
> 3) Sometimes I get this (once in three boots or so)
> 
> [   36.217405] ENABLING IO-APIC IRQs
> [   36.217587] ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
> [   36.433917] APIC timer disabled due to verification failure.
> 
> And NO_HZ is disabled due to that (I get 1000/s timer's interrupts)
> I haven't investigated that yet.
> It looks like another new test that my hardware fails to perform... 
> 
> 
> And now I want to tell you about that _almost_ working suspend to ram I got 
> 

Re: [PATCH] UML utrace support, step 1

2007-03-16 Thread Roland McGrath
> Did I send the right patch?  The one I meant to send (appended below),
> indeed builds and runs without utrace-regset.patch and
> utrace-core.patch applied.  It's utrace-1 in the following:

That is not the same patch I tried before.  This one does apply and build
fine (after make defconfig ARCH=um).  

Technically utrace_native_view doesn't belong in the tracehook patch (but
in the regset one to follow it), but it is harmless.

I've merged it into my current tree.  You can get this via GIT if you like
that.  I do regular merges from Linus's GIT tree, so any upstream UML
changes that merge cleanly or have trivial conflict fixes I can do will be
taken care of automatically.  For future updates, please send me
incremental changes based on my current stuff; the GIT branches and the
2.6-current patches on my utrace web page always match up.  My patch set
generation uses just one fixed log message for utrace-tracehook-um.patch,
so tell me if you want to adjust it from log text in today's patch.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] clockevents: Fix suspend/resume to disk hangs

2007-03-16 Thread Thomas Gleixner
I finally found a dual core box, which survives suspend/resume without
crashing in the middle of nowhere. Sigh, I never figured out from the
code and the bug reports what's going on.

The observed hangs are caused by a stale state transition of the clock
event devices, which keeps the RCU synchronization away from completion,
when the non boot CPU is brought back up.

The suspend/resume in oneshot mode needs the similar care as the
periodic mode during suspend to RAM. My assumption that the state
transitions during the different shutdown/bringups of s2disk would go
through the periodic boot phase and then switch over to highres resp.
nohz mode were simply wrong.

Add the appropriate suspend / resume handling for the non periodic
modes.

Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 5567745..eadfce2 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -307,12 +307,19 @@ int tick_resume_broadcast(void)
spin_lock_irqsave(&tick_broadcast_lock, flags);
 
bc = tick_broadcast_device.evtdev;
-   if (bc) {
-   if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC &&
-   !cpus_empty(tick_broadcast_mask))
-   tick_broadcast_start_periodic(bc);
 
-   broadcast = cpu_isset(smp_processor_id(), tick_broadcast_mask);
+   if (bc) {
+   switch (tick_broadcast_device.mode) {
+   case TICKDEV_MODE_PERIODIC:
+   if(!cpus_empty(tick_broadcast_mask))
+   tick_broadcast_start_periodic(bc);
+   broadcast = cpu_isset(smp_processor_id(),
+ tick_broadcast_mask);
+   break;
+   case TICKDEV_MODE_ONESHOT:
+   broadcast = tick_resume_broadcast_oneshot(bc);
+   break;
+   }
}
spin_unlock_irqrestore(&tick_broadcast_lock, flags);
 
@@ -347,6 +354,16 @@ static int tick_broadcast_set_event(ktime_t expires, int 
force)
}
 }
 
+int tick_resume_broadcast_oneshot(struct clock_event_device *bc)
+{
+   clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT);
+
+   if(!cpus_empty(tick_broadcast_oneshot_mask))
+   tick_broadcast_set_event(ktime_get(), 1);
+
+   return cpu_isset(smp_processor_id(), tick_broadcast_oneshot_mask);
+}
+
 /*
  * Reprogram the broadcast device:
  *
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 43ba1bd..bfda3f7 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -298,18 +298,17 @@ static void tick_shutdown(unsigned int *cpup)
spin_unlock_irqrestore(&tick_device_lock, flags);
 }
 
-static void tick_suspend_periodic(void)
+static void tick_suspend(void)
 {
struct tick_device *td = &__get_cpu_var(tick_cpu_device);
unsigned long flags;
 
spin_lock_irqsave(&tick_device_lock, flags);
-   if (td->mode == TICKDEV_MODE_PERIODIC)
-   clockevents_set_mode(td->evtdev, CLOCK_EVT_MODE_SHUTDOWN);
+   clockevents_set_mode(td->evtdev, CLOCK_EVT_MODE_SHUTDOWN);
spin_unlock_irqrestore(&tick_device_lock, flags);
 }
 
-static void tick_resume_periodic(void)
+static void tick_resume(void)
 {
struct tick_device *td = &__get_cpu_var(tick_cpu_device);
unsigned long flags;
@@ -317,6 +316,8 @@ static void tick_resume_periodic(void)
spin_lock_irqsave(&tick_device_lock, flags);
if (td->mode == TICKDEV_MODE_PERIODIC)
tick_setup_periodic(td->evtdev, 0);
+   else
+   tick_resume_oneshot();
spin_unlock_irqrestore(&tick_device_lock, flags);
 }
 
@@ -348,13 +349,13 @@ static int tick_notify(struct notifier_block *nb, 
unsigned long reason,
break;
 
case CLOCK_EVT_NOTIFY_SUSPEND:
-   tick_suspend_periodic();
+   tick_suspend();
tick_suspend_broadcast();
break;
 
case CLOCK_EVT_NOTIFY_RESUME:
if (!tick_resume_broadcast())
-   tick_resume_periodic();
+   tick_resume();
break;
 
default:
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index 75890ef..c9d203b 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -19,12 +19,13 @@ extern void tick_setup_oneshot(struct clock_event_device 
*newdev,
 extern int tick_program_event(ktime_t expires, int force);
 extern void tick_oneshot_notify(void);
 extern int tick_switch_to_oneshot(void (*handler)(struct clock_event_device 
*));
-
+extern void tick_resume_oneshot(void);
 # ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
 extern void tick_broadcast_setup_oneshot(struct clock_event_device *bc);
 extern void tick_broadcast_oneshot_control(unsigned long reason);
 extern void tick_broadcast

Re: Linux 2.6.21-rc4

2007-03-16 Thread Chris Friesen

Randy Dunlap wrote:


allmodconfig on i386:

WARNING: "default_idle" [arch/i386/kernel/apm.ko] undefined!
WARNING: "machine_real_restart" [arch/i386/kernel/apm.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2



Please ignore.

I think that this was the result of doing 'make allyesconfig && make all'
followed by 'make allmodconfig && make all' without doing a 'make clean'
between them.


This would seem to be a bug in the build system then.  Or are you 
supposed to "make clean" after every config change?


Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: RSDL v0.31

2007-03-16 Thread Dirk Schoebel
Freitag, 16. März 2007 wrote Mike Galbraith:
> On Sat, 2007-03-17 at 08:13 +1100, Con Kolivas wrote:
> > On Saturday 17 March 2007 02:34, Mike Galbraith wrote:
> > > On Sat, 2007-03-17 at 00:40 +1100, Con Kolivas wrote:
> > > > Here are full patches for rsdl 0.31 for various base kernels. A full
> > > > announce with a fresh -mm series will follow...
> > > >
> > > > http://ck.kolivas.org/patches/staircase-deadline/2.6.20.3-rsdl-0.31.p
> > > >atch
> > > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsd
> > > >l-0. 31.patch
> > > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-
> > > >0.31 .patch
> > >
> > > It still has trouble with the x/gforce vs two niced encoders scenario.
> > > The previously reported choppiness is still present.
> > >
> > > I suspect that x/gforce landing in the expired array is the trouble,
> > > and that this will never be smooth without some kind of exemption.  I
> > > added some targeted unfairness to .30, and it didn't help much at all.
> > >
> > > Priorities going all the way to 1 were a surprise.
> >
> > It wasn't going to change that case without renicing X.
>
> Con.  You are trying to wedge a fair scheduler into an environment where
> totally fair simply can not possibly function.
>
> If this is your final answer to the problem space, I am done testing,
> and as far as _I_ am concerned, your scheduler is an utter failure.
>

I can not let this comment stay like that. I have an AMD X2 4400+ Dual Core 
running Gentoo and now kernel 2.6.21-rc3 with RSDL 0.30 (HZ=300).
Up till now whenever I wanted to watch a movie i had to stop compiling with 
more than one task for the movie to run without skips. When playing games i 
have to renice the game (-15-) or else it would get 'choppy'.
With the new RSDL i compile packages with -j3 (reniced to 15), my wife lets up 
to 8 computations (scientific computations) running at the same time and the 
game and a movie still run without any visible flaws. The only thing i saw 
till now was that the mouse cursor was a little less responsive and scrolling 
in firefox took a little longer. But amarok for music, the movie in mplayer, 
the 3d game, everything went smooth though a load of > 11. This all without 
even renicing anything but the compiles. With mainline kernel already 
watching a movie with this load was impossible.
I used the staircase scheduler before RSDL but even with staircase such 
overload was not possible while watching a movie.
Mike, maybe use higher nice levels for your encoders or just use one. Or maybe 
scheck your memory, i guess if the memory bandwidth is too low there's no 
scheduler which can foresee such thing and react accordingly. Since you have 
a HT system it's just one physical ALU, so everything has to be squeezed onto 
this one ALU, up to a certain degree it works, but not forever. And the lame 
encoders i suppose won't wait that very much and long for their data to get 
delivered from memory so they'll utilize the ALU quite a lot.
Con, continue your scheduler development as it helps many cases which were not 
possible otherwise. I'm amazed of the ability of the scheduler to handle a 5 
times overloaded system without too much hazzle.
Great work Con.

Dirk.

PS: Con, don't stress your neck too much, your health is the only thing you 
have to keep for live.


pgpoF3c77vSOb.pgp
Description: PGP signature


Re: [stable] [PATCH] UML - arch_prctl should set thread fs

2007-03-16 Thread Greg KH
On Fri, Mar 16, 2007 at 12:10:15PM -0400, Jeff Dike wrote:
> [ This missed getting into -stable the first time I sent it ]

That's because it doesn't apply at all to the current 2.6.20.3 kernel
tree.  Can you rediff it for that one so that we can apply it properly?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible "struct pid" leak from tty_io.c

2007-03-16 Thread Catalin Marinas

On 16/03/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:

"Catalin Marinas" <[EMAIL PROTECTED]> writes:
> It seems to fix the leak. I looked at the logs and proc_set_tty calls
> put_pid twice for pid 245 (the unresolved leak) and get_pid for pid
> 296, which is later passed to put_pid via do_tty_hangup.

Ok.  Any chance you could help me track down which application is
ultimately calling proc_set_tty (I think it has pid 296 in your case).


I'll look at this on Monday since I was trying it on an ARM embedded
platform in the office (with a Debian filesystem).

--
Catalin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21-rc4

2007-03-16 Thread Randy Dunlap
On Fri, 16 Mar 2007 14:11:21 -0700 Randy Dunlap wrote:

> On Fri, 16 Mar 2007 09:33:54 -0700 (PDT) Linus Torvalds wrote:
> 
> > 
> > I pushed out the -git trees yesterday, but then got distracted, so the 
> > patches and tar-balls and the announcement got delayed until this morning. 
> > Oops. I'm a scatter-brain.
> 
> allmodconfig on i386:
> 
> WARNING: "default_idle" [arch/i386/kernel/apm.ko] undefined!
> WARNING: "machine_real_restart" [arch/i386/kernel/apm.ko] undefined!
> make[1]: *** [__modpost] Error 1
> make: *** [modules] Error 2

Please ignore.

I think that this was the result of doing 'make allyesconfig && make all'
followed by 'make allmodconfig && make all' without doing a 'make clean'
between them.

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Conflict between ide and usb?

2007-03-16 Thread John Coppens
On Fri, 16 Mar 2007 16:34:15 -0400
[EMAIL PROTECTED] (Lennart Sorensen) wrote:

> Do you even have DMA enabled for the DVD drive?  Without it it will be
> very slow and painful for the CPU.  I also have noticed that many fast
> (16x) DVD writers must have an 80 wire cable or they won't work
> correctly and do nasty things to the system.
> 
> Check /proc/ide/hdX/settings|grep dma and see if it says 1 0 1 or 0 0 1.

I _knew_ I enabled the DMA, but it seems that is switches off when I start
the transfer:

hdc: DMA disabled
hdc: ide_intr: huh? expected NULL handler on exit
hdc: ATAPI reset complete
ISO 9660 Extensions: RRIP_1991A
VFS: busy inodes on changed media.

This is on hdc, which is the DVD. Harddisk has DMA on.

John
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH] split file and anonymous page queues

2007-03-16 Thread Rik van Riel

Split the anonymous and file backed pages out onto their own pageout
queues.  This we do not unnecessarily churn through lots of anonymous
pages when we do not want to swap them out anyway.

This should (with additional tuning) be a great step forward in
scalability, allowing Linux to run well on very large systems where
scanning through the anonymous memory (on our way to the page cache
memory we do want to evict) is slowing systems down significantly.

This patch has been stress tested and seems to work, but has not
been fine tuned or benchmarked yet.  For now the swappiness parameter
can be used to tweak swap aggressiveness up and down as desired, but
in the long run we may want to simply measure IO cost of page cache
and anonymous memory and auto-adjust.

We apply pressure to each of sets of the pageout queues based on:
- the size of each queue
- the fraction of recently referenced pages in each queue,
  not counting used-once file pages
- swappiness (file IO is more efficient than swap IO)

Please take this patch for a spin and let me know what goes well
and what goes wrong.

More info on the patch can be found on:

http://linux-mm.org/PageReplacementDesign

Signed-off-by: Rik van Riel <[EMAIL PROTECTED]>

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.
--- linux-2.6.20.x86_64/fs/proc/proc_misc.c.vmsplit	2007-03-16 11:35:48.0 -0400
+++ linux-2.6.20.x86_64/fs/proc/proc_misc.c	2007-03-16 11:35:55.0 -0400
@@ -147,43 +147,47 @@ static int meminfo_read_proc(char *page,
 	 * Tagged format, for easy grepping and expansion.
 	 */
 	len = sprintf(page,
-		"MemTotal: %8lu kB\n"
-		"MemFree:  %8lu kB\n"
-		"Buffers:  %8lu kB\n"
-		"Cached:   %8lu kB\n"
-		"SwapCached:   %8lu kB\n"
-		"Active:   %8lu kB\n"
-		"Inactive: %8lu kB\n"
+		"MemTotal:   %8lu kB\n"
+		"MemFree:%8lu kB\n"
+		"Buffers:%8lu kB\n"
+		"Cached: %8lu kB\n"
+		"SwapCached: %8lu kB\n"
+		"Active(anon):   %8lu kB\n"
+		"Inactive(anon): %8lu kB\n"
+		"Active(file):   %8lu kB\n"
+		"Inactive(file): %8lu kB\n"
 #ifdef CONFIG_HIGHMEM
-		"HighTotal:%8lu kB\n"
-		"HighFree: %8lu kB\n"
-		"LowTotal: %8lu kB\n"
-		"LowFree:  %8lu kB\n"
-#endif
-		"SwapTotal:%8lu kB\n"
-		"SwapFree: %8lu kB\n"
-		"Dirty:%8lu kB\n"
-		"Writeback:%8lu kB\n"
-		"AnonPages:%8lu kB\n"
-		"Mapped:   %8lu kB\n"
-		"Slab: %8lu kB\n"
-		"SReclaimable: %8lu kB\n"
-		"SUnreclaim:   %8lu kB\n"
-		"PageTables:   %8lu kB\n"
-		"NFS_Unstable: %8lu kB\n"
-		"Bounce:   %8lu kB\n"
-		"CommitLimit:  %8lu kB\n"
-		"Committed_AS: %8lu kB\n"
-		"VmallocTotal: %8lu kB\n"
-		"VmallocUsed:  %8lu kB\n"
-		"VmallocChunk: %8lu kB\n",
+		"HighTotal:  %8lu kB\n"
+		"HighFree:   %8lu kB\n"
+		"LowTotal:   %8lu kB\n"
+		"LowFree:%8lu kB\n"
+#endif
+		"SwapTotal:  %8lu kB\n"
+		"SwapFree:   %8lu kB\n"
+		"Dirty:  %8lu kB\n"
+		"Writeback:  %8lu kB\n"
+		"AnonPages:  %8lu kB\n"
+		"Mapped: %8lu kB\n"
+		"Slab:   %8lu kB\n"
+		"SReclaimable:   %8lu kB\n"
+		"SUnreclaim: %8lu kB\n"
+		"PageTables: %8lu kB\n"
+		"NFS_Unstable:   %8lu kB\n"
+		"Bounce: %8lu kB\n"
+		"CommitLimit:%8lu kB\n"
+		"Committed_AS:   %8lu kB\n"
+		"VmallocTotal:   %8lu kB\n"
+		"VmallocUsed:%8lu kB\n"
+		"VmallocChunk:   %8lu kB\n",
 		K(i.totalram),
 		K(i.freeram),
 		K(i.bufferram),
 		K(cached),
 		K(total_swapcache_pages),
-		K(global_page_state(NR_ACTIVE)),
-		K(global_page_state(NR_INACTIVE)),
+		K(global_page_state(NR_ACTIVE_ANON)),
+		K(global_page_state(NR_INACTIVE_ANON)),
+		K(global_page_state(NR_ACTIVE_FILE)),
+		K(global_page_state(NR_INACTIVE_FILE)),
 #ifdef CONFIG_HIGHMEM
 		K(i.totalhigh),
 		K(i.freehigh),
--- linux-2.6.20.x86_64/fs/mpage.c.vmsplit	2007-02-04 13:44:54.0 -0500
+++ linux-2.6.20.x86_64/fs/mpage.c	2007-03-16 11:35:55.0 -0400
@@ -408,12 +408,12 @@ mpage_readpages(struct address_space *ma
 	&first_logical_block,
 	get_block);
 			if (!pagevec_add(&lru_pvec, page))
-__pagevec_lru_add(&lru_pvec);
+__pagevec_lru_add_file(&lru_pvec);
 		} else {
 			page_cache_release(page);
 		}
 	}
-	pagevec_lru_add(&lru_pvec);
+	pagevec_lru_add_file(&lru_pvec);
 	BUG_ON(!list_empty(pages));
 	if (bio)
 		mpage_bio_submit(READ, bio);
--- linux-2.6.20.x86_64/fs/cifs/file.c.vmsplit	2007-03-16 11:35:47.0 -0400
+++ linux-2.6.20.x86_64/fs/cifs/file.c	2007-03-16 11:35:56.0 -0400
@@ -1746,7 +1746,7 @@ static void cifs_copy_cache_pages(struct
 		SetPageUptodate(page);
 		unlock_page(page);
 		if (!pagevec_add(plru_pvec, page))
-			__pagevec_lru_add(plru_pvec);
+			__pagevec_lru_add_file(plru_pvec);
 		data += PAGE_CACHE_SIZE;
 	}
 	return;
@@ -1880,7 +1880,7 @@ static int cifs_readpages(struct file *f
 		bytes_read = 0;
 	}
 
-	pagevec_lru_add(&lru_pvec

Re: [PATCH] cleanpatch: a script to clean up stealth whitespace added by a patch

2007-03-16 Thread Sam Ravnborg
On Fri, Mar 16, 2007 at 02:45:06PM -0700, H. Peter Anvin wrote:
> This script is a companion to the "cleanfile" script.  This cleans
> up a patch in unified diff format *before* it is applied.  Note that
> the empty lines at the end of file detection *requires* that the diff was
> taken with at least one line of context around each hunk, or bad things
> will happen.
> 
> This script cleans up various classes of stealth whitespace.  In
> particular, it cleans up:
> 
> - Whitespace (spaces or tabs)before newline;
> - DOS line endings (CR before LF);
> - Space before tab (spaces are deleted or converted to tabs);
> - Empty lines at end of file.
> 
> Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]>

This and the companion "cleanfile" has been applied to kbuild.git.

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.31

2007-03-16 Thread Mike Galbraith
On Sat, 2007-03-17 at 08:13 +1100, Con Kolivas wrote:
> On Saturday 17 March 2007 02:34, Mike Galbraith wrote:
> > On Sat, 2007-03-17 at 00:40 +1100, Con Kolivas wrote:
> > > Here are full patches for rsdl 0.31 for various base kernels. A full
> > > announce with a fresh -mm series will follow...
> > >
> > > http://ck.kolivas.org/patches/staircase-deadline/2.6.20.3-rsdl-0.31.patch
> > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0.
> > >31.patch
> > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.31
> > >.patch
> >
> > It still has trouble with the x/gforce vs two niced encoders scenario.
> > The previously reported choppiness is still present.
> >
> > I suspect that x/gforce landing in the expired array is the trouble, and
> > that this will never be smooth without some kind of exemption.  I added
> > some targeted unfairness to .30, and it didn't help much at all.
> >
> > Priorities going all the way to 1 were a surprise.
> 
> It wasn't going to change that case without renicing X.

Con.  You are trying to wedge a fair scheduler into an environment where
totally fair simply can not possibly function.

If this is your final answer to the problem space, I am done testing,
and as far as _I_ am concerned, your scheduler is an utter failure.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] scsi: megaraid_sas - throttle io if cmds are in risk of being timed-out

2007-03-16 Thread James Bottomley
On Thu, 2007-03-01 at 13:29 -0800, Sumant Patro wrote:
> Driver to throttle IO to reduce risk of OS timing out cmds.
> 
> Implemented a circular queue to keep track of pending OS cmds in FW. 
> This queue is periodically (every 10 sec) checked by a timer routine.
> If there is any cmd that is in risk of getting timed-out by the OS, 
> the host->can_queue is reduced to 16 and MEGASAS_FW_BUSY flag is set. 
> The host->can_queue will be restored to default value when the following 
> two conditions are met : 5 secs has elapsed and the # of outstanding cmds
> in FW is less than 17.
> Also increased the per cmd timeout to 120*HZ from 90*HZ.

OK, this is still not nice.  What you need to be doing is intercepting
the timeout before it fires (and quiesces the machine).  Currently the
eh_timed_out() callback is only exposed to transport classes, I'll put
it back into the host template and then you can use it.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 17/26] Xen-paravirt_ops: Add nosegneg capability to the vsyscall page notes

2007-03-16 Thread Roland McGrath
> I'm not quite sure what you're suggesting here though.  Do you mean one of:
> 
> NOTE_KERNELCAP_BEGIN(1, 1)
> NOTE_KERNELCAP(0, "nosegneg")
> NOTE_KERNELCAP_END
> 
> or
> 
> NOTE_KERNELCAP_BEGIN(1, 2)
> NOTE_KERNELCAP(1, "nosegneg")
> NOTE_KERNELCAP_END
> 
> is the correct thing to use?

Yes.  (Sorry about the typo, 1 and 0 are close enough aren't they? ;-)

> > Some pre-release glibc's (before 2.4) had a bug in the code that parses
> > this, and would crash parsing the correct note.  Using the wrong bit value
> > with nonmatching mask worked around this.  IIRC, no glibc release ever
> > included the buggy version of the code.  In nonbuggy glibc, the mismatched
> > value causes the "nosegneg" to be omitted from the directory search (under
> > LD_LIBRARY_PATH and default directories), though ldconfig-based lookups
> > will work (the most common case).
> 
> Are you saying that one of the corrected forms might cause old glibcs to
> crash, or just ignore nosegneg?

Yes, the affected glibc crashed with the canonical form (the first above).
I'm not sure such a glibc exists in the wild today, maybe only in some FC5
test releases (I CC'd Jakub so he can verify that at least as far as FC).
Rik van Riel discovered that the s/0/1/ tweak sufficed for common daily use
(i.e. ld.so.cache hits), and did that temporarily when he was hacking on
Xen Linux kernels for Fedora, but reverted it after we fixed glibc.

The uncorrected form causes current (correct) glibc to ignore nosegneg 
(for cache misses).

Looking at the buggy version of the code, I think it will not crash with
the second form above, just avoid using bit 0.  (But I wouldn't swear to it
without testing it.)  The second form should certainly be fine with the
current glibc.  Just make sure that "kernelcap 1 nosegneg" is used in the
ld.so.conf.d file, to match 1 in the NOTE_KERNELCAP first arg.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 03/26] Xen-paravirt_ops: use paravirt_nop to consistently mark no-op operations

2007-03-16 Thread Chris Wright
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> Chris Wright wrote:
> > I mean like this (bunch of work, for a type check that we're really ignoring
> > anwyay, but this is the idea...)
> 
> Oh, I see.  I think this is the best argument yet for the current
> arrangement...

Heh, like I said it's a bunch of work for literally nothing ;-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 03/26] Xen-paravirt_ops: use paravirt_nop to consistently mark no-op operations

2007-03-16 Thread Jeremy Fitzhardinge
Chris Wright wrote:
> I mean like this (bunch of work, for a type check that we're really ignoring
> anwyay, but this is the idea...)
>   

Oh, I see.  I think this is the best argument yet for the current
arrangement...

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] Export not_critical_when_idle feature in workqueue and use it in ondemand

2007-03-16 Thread Venkatesh Pallipadi


Add a new not_critical_when_idle parameter to queue_delayed_work_on(). This
parameter can be used to schedule work that are 'unimportant' when
CPU is idle and can be called later, when CPU eventually comes out of idle.

Use this parameter in cpufreq ondemand governor.
 
Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]>

Index: linux-2.6.20/kernel/workqueue.c
===
--- linux-2.6.20.orig/kernel/workqueue.c2007-03-16 14:51:00.0 
-0700
+++ linux-2.6.20/kernel/workqueue.c 2007-03-16 14:51:21.0 -0700
@@ -271,11 +271,13 @@
  * @wq: workqueue to use
  * @dwork: work to queue
  * @delay: number of jiffies to wait before queueing
+ * @not_critical_when_idle: 1 indicates work is not critical when CPU is idle
  *
  * Returns 0 if @work was already on a queue, non-zero otherwise.
  */
 int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
-   struct delayed_work *dwork, unsigned long delay)
+   struct delayed_work *dwork, unsigned long delay,
+   int not_critical_when_idle)
 {
int ret = 0;
struct timer_list *timer = &dwork->timer;
@@ -290,7 +292,7 @@
timer->expires = jiffies + delay;
timer->data = (unsigned long)dwork;
timer->function = delayed_work_timer_fn;
-   add_timer_on(timer, cpu, 0);
+   add_timer_on(timer, cpu, not_critical_when_idle);
ret = 1;
}
return ret;
@@ -614,7 +616,7 @@
 int schedule_delayed_work_on(int cpu,
struct delayed_work *dwork, unsigned long delay)
 {
-   return queue_delayed_work_on(cpu, keventd_wq, dwork, delay);
+   return queue_delayed_work_on(cpu, keventd_wq, dwork, delay, 0);
 }
 EXPORT_SYMBOL(schedule_delayed_work_on);
 
Index: linux-2.6.20/drivers/cpufreq/cpufreq_ondemand.c
===
--- linux-2.6.20.orig/drivers/cpufreq/cpufreq_ondemand.c2007-03-16 
14:51:00.0 -0700
+++ linux-2.6.20/drivers/cpufreq/cpufreq_ondemand.c 2007-03-16 
14:51:21.0 -0700
@@ -457,7 +457,7 @@
dbs_info->freq_lo,
CPUFREQ_RELATION_H);
}
-   queue_delayed_work_on(cpu, kondemand_wq, &dbs_info->work, delay);
+   queue_delayed_work_on(cpu, kondemand_wq, &dbs_info->work, delay, 1);
unlock_policy_rwsem_write(cpu);
 }
 
@@ -472,7 +472,7 @@
dbs_info->sample_type = DBS_NORMAL_SAMPLE;
INIT_DELAYED_WORK(&dbs_info->work, do_dbs_timer);
queue_delayed_work_on(dbs_info->cpu, kondemand_wq, &dbs_info->work,
- delay);
+ delay, 1);
 }
 
 static inline void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)
Index: linux-2.6.20/include/linux/workqueue.h
===
--- linux-2.6.20.orig/include/linux/workqueue.h 2007-03-16 14:51:00.0 
-0700
+++ linux-2.6.20/include/linux/workqueue.h  2007-03-16 14:51:21.0 
-0700
@@ -170,7 +170,8 @@
 extern int FASTCALL(queue_work(struct workqueue_struct *wq, struct work_struct 
*work));
 extern int FASTCALL(queue_delayed_work(struct workqueue_struct *wq, struct 
delayed_work *work, unsigned long delay));
 extern int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
-   struct delayed_work *work, unsigned long delay);
+   struct delayed_work *work, unsigned long delay,
+   int not_critical_when_idle);
 extern void FASTCALL(flush_workqueue(struct workqueue_struct *wq));
 
 extern int FASTCALL(schedule_work(struct work_struct *work));
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] Add not_critical_when_idle timer

2007-03-16 Thread Venkatesh Pallipadi

Introduce a new kind of timers - not_critical_when_idle timers:
Timers that work normally when system is busy. But, will not cause CPU to
come out of idle (just to service this timer), when CPU is idle. Instead,
this timer will be serviced when CPU eventually wakes up with a subsequent
critical_when_idle timer.

The main advantage of this is to avoid unnecessary timer interrupts when
CPU is idle. If the routine currently called by a timer can wait until next
event without any issues, this new timer can be used to setup timer event
for that routine. This, with dynticks, allows CPUs to be lazy, allowing them
to stay in idle for extended period of time by reducing unnecesary wakeup and
thereby reducing the power consumption.

This patch:
Builds this new timer on top of existing timer infrastructure. It uses
last bit in 'base' pointer of timer_list structure to store this
extra information about timer. __next_timer_interrupt() function
skips over these not_critical_when_idle timers when CPU looks for
next timer event for which it has to wake up.

This is exported by a new interface add_timer_with_hint() and also a new
parameter is added to existing add_timer_on() interface.

Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]>

Index: linux-2.6.20/kernel/timer.c
===
--- linux-2.6.20.orig/kernel/timer.c2007-03-16 14:13:19.0 -0700
+++ linux-2.6.20/kernel/timer.c 2007-03-16 14:51:15.0 -0700
@@ -74,7 +74,7 @@
tvec_t tv3;
tvec_t tv4;
tvec_t tv5;
-} cacheline_aligned_in_smp;
+} cacheline_aligned;
 
 typedef struct tvec_t_base_s tvec_base_t;
 
@@ -325,7 +325,7 @@
tvec_base_t *base;
 
for (;;) {
-   base = timer->base;
+   base = TBASE_GET_BASE_PTR(timer->base);
if (likely(base != NULL)) {
spin_lock_irqsave(&base->lock, *flags);
if (likely(base == timer->base))
@@ -364,12 +364,15 @@
 * the timer is serialized wrt itself.
 */
if (likely(base->running_timer != timer)) {
+   unsigned long tflag;
+   tflag = TBASE_GET_DELAYED_ON_IDLE(timer->base);
/* See the comment in lock_timer_base() */
timer->base = NULL;
spin_unlock(&base->lock);
base = new_base;
spin_lock(&base->lock);
-   timer->base = base;
+   timer->base =
+   TBASE_MERGE_DELAYED_ON_IDLE(new_base, tflag);
}
}
 
@@ -386,10 +389,12 @@
  * add_timer_on - start a timer on a particular CPU
  * @timer: the timer to be added
  * @cpu: the CPU to start it on
+ * @not_critical_when_idle: 1 to indicate timer is not critical and
+ *  can be delayed when CPU is idle
  *
  * This is not very scalable on SMP. Double adds are not possible.
  */
-void add_timer_on(struct timer_list *timer, int cpu)
+void add_timer_on(struct timer_list *timer, int cpu, int 
not_critical_when_idle)
 {
tvec_base_t *base = per_cpu(tvec_bases, cpu);
unsigned long flags;
@@ -397,7 +402,7 @@
timer_stats_timer_set_start_info(timer);
BUG_ON(timer_pending(timer) || !timer->function);
spin_lock_irqsave(&base->lock, flags);
-   timer->base = base;
+   timer->base = TBASE_MERGE_DELAYED_ON_IDLE(base, not_critical_when_idle);
internal_add_timer(base, timer);
spin_unlock_irqrestore(&base->lock, flags);
 }
@@ -548,7 +553,7 @@
 * don't have to detach them individually.
 */
list_for_each_entry_safe(timer, tmp, &tv_list, entry) {
-   BUG_ON(timer->base != base);
+   BUG_ON(TBASE_GET_BASE_PTR(timer->base) != base);
internal_add_timer(base, timer);
}
 
@@ -634,6 +639,9 @@
index = slot = timer_jiffies & TVR_MASK;
do {
list_for_each_entry(nte, base->tv1.vec + slot, entry) {
+   if (TBASE_GET_DELAYED_ON_IDLE(nte->base))
+   continue;
+ 
found = 1;
expires = nte->expires;
/* Look at the cascade bucket(s)? */
@@ -1602,6 +1610,13 @@
cpu_to_node(cpu));
if (!base)
return -ENOMEM;
+
+   /* Make sure that tvec_base is 2 byte aligned */
+   if (TBASE_GET_DELAYED_ON_IDLE(base)) {
+   WARN_ON(1);
+   kfree(base);
+   return -ENOMEM;
+   }
memset(base, 0, sizeof(*base));
per_cpu(tvec_bases, cpu) = base;
}

Re: Possible "struct pid" leak from tty_io.c

2007-03-16 Thread Eric W. Biederman
"Catalin Marinas" <[EMAIL PROTECTED]> writes:

> On 14/03/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:
>> How does this look?
>
> It seems to fix the leak. I looked at the logs and proc_set_tty calls
> put_pid twice for pid 245 (the unresolved leak) and get_pid for pid
> 296, which is later passed to put_pid via do_tty_hangup.

Ok.  Any chance you could help me track down which application is
ultimately calling proc_set_tty (I think it has pid 296 in your case).

>From what I can tell reading the source it is calling
TIOCSCTTY with the arg field set to 1.  Which is a linux extension
to force other people off of the tty.

My skimming of the implementation says to me that we are forcing
other processes of the tty in the wrong way.  I think we should call
tty_vhangup (so those processes kicked off get normal terminal hangup
behavior) and not simply session_clear_tty.  However if I am correct the
implementation has been broken for over a decade so I am reluctant to
just change it without tracking down the users.  

The patch below is what I am looking at for a comprehensive fix.  I
think it fixes the second set of leaks in a better manner by simply
fixing callers to do the sane thing.

Eric


diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index e453268..5140f15 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -1376,6 +1376,8 @@ static void do_tty_hangup(struct work_struct *work)
read_unlock(&tasklist_lock);
 
tty->flags = 0;
+   put_pid(tty->session);
+   put_pid(tty->pgrp);
tty->session = NULL;
tty->pgrp = NULL;
tty->ctrl_status = 0;
@@ -2972,9 +2974,7 @@ static int tiocsctty(struct tty_struct *tty, int arg)
/*
 * Steal it away
 */
-   read_lock(&tasklist_lock);
-   session_clear_tty(tty->session);
-   read_unlock(&tasklist_lock);
+   tty_vhangup(tty);
} else {
ret = -EPERM;
goto unlock;
@@ -3850,7 +3850,7 @@ static struct pid *__proc_set_tty(struct task_struct 
*tsk, struct tty_struct *tt
return old_pgrp;
 }
 
-void proc_set_tty(struct task_struct *tsk, struct tty_struct *tty)
+static void proc_set_tty(struct task_struct *tsk, struct tty_struct *tty)
 {
struct pid *old_pgrp;
 
diff --git a/include/linux/tty.h b/include/linux/tty.h
index dee72b9..5f7a5fb 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -333,7 +333,6 @@ extern int tty_ioctl(struct inode *inode, struct file 
*file, unsigned int cmd,
 
 extern dev_t tty_devnum(struct tty_struct *tty);
 extern void proc_clear_tty(struct task_struct *p);
-extern void proc_set_tty(struct task_struct *tsk, struct tty_struct *tty);
 extern struct tty_struct *get_current_tty(void);
 
 extern struct mutex tty_mutex;
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 19a385e..84b489a 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1759,12 +1759,16 @@ static inline void flush_unauthorized_files(struct 
files_struct * files)
}
file_list_unlock();
 
-   /* Reset controlling tty. */
-   if (drop_tty)
-   proc_set_tty(current, NULL);
}
mutex_unlock(&tty_mutex);
 
+   /* Reset controlling tty. */
+   if (drop_tty) {
+   if (current->signal->leader)
+   disassociate_ctty(0);
+   proc_clear_tty(current);
+   }
+
/* Revalidate access to inherited open files. */
 
AVC_AUDIT_DATA_INIT(&ad,FS);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 03/26] Xen-paravirt_ops: use paravirt_nop to consistently mark no-op operations

2007-03-16 Thread Chris Wright
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> Chris Wright wrote:
> > how about __paravirt_nop_start < func < __paravirt_nop_end  and preserve
> > the types?
> >   
> 
> Er?  The reason for the (void *) cast is to stop gcc complaining about
> mismatched pointer types.

I mean like this (bunch of work, for a type check that we're really ignoring
anwyay, but this is the idea...)

diff -r 930fff55070e arch/i386/kernel/paravirt.c
--- a/arch/i386/kernel/paravirt.c   Fri Mar 16 11:09:10 2007 -0700
+++ b/arch/i386/kernel/paravirt.c   Fri Mar 16 14:48:04 2007 -0700
@@ -35,10 +35,60 @@
 #include 
 #include 
 
-/* nop stub */
-void _paravirt_nop(void)
-{
-}
+/* nop stubs */
+void __paravirt_nop pv_nop_arch_setup(void)
+{
+}
+void __paravirt_nop pv_nop_set_lazy_mode(int mode)
+{
+}
+void __paravirt_nop pv_nop_alloc_pt(struct mm_struct *mm, u32 pfn)
+{
+}
+void __paravirt_nop pv_nop_alloc_pd(u32 pfn)
+{
+}
+void __paravirt_nop pv_nop_alloc_pd_clone(u32 pfn, u32 clonepfn, u32 start, 
u32 count)
+{
+}
+void __paravirt_nop pv_nop_release_pt(u32 pfn)
+{
+}
+void __paravirt_nop pv_nop_release_pd(u32 pfn)
+{
+}
+void __paravirt_nop pv_nop_pte_update(struct mm_struct *mm, unsigned long 
addr, pte_t *ptep)
+{
+}
+void __paravirt_nop pv_nop_pte_update_defer(struct mm_struct *mm, unsigned 
long addr, pte_t *ptep)
+{
+}
+void * __paravirt_nop pv_nop_kmap_atomic_pte(struct page *page, enum km_type 
type)
+{
+}
+void __paravirt_nop pv_nop_load_tr_desc(void)
+{
+}
+void __paravirt_nop pv_nop_activate_mm(struct mm_struct *prev, struct 
mm_struct *next)
+{
+}
+void __paravirt_nop pv_nop_dup_mmap(struct mm_struct *oldmm, struct mm_struct 
*mm)
+{
+}
+void __paravirt_nop pv_nop_exit_mmap(struct mm_struct *mm)
+{
+}
+void __paravirt_nop pv_nop_startup_ipi_hook(int phys_apicid, unsigned long 
start_eip, unsigned long start_esp)
+{
+}
+#ifdef CONFIG_X86_LOCAL_APIC
+void __paravirt_nop pv_nop_setup_boot_clock(void)
+{
+}
+void __paravirt_nop pv_nop_setup_secondary_clock(void)
+{
+}
+#endif
 
 static void __init default_banner(void)
 {
@@ -166,7 +216,7 @@ unsigned paravirt_patch_default(u8 type,
if (opfunc == NULL)
/* If there's no function, patch it with a ud2a (BUG) */
ret = paravirt_patch_insns(site, len, start_ud2a, end_ud2a);
-   else if (opfunc == paravirt_nop)
+   else if (opfunc >= __paravirt_nop_start || opfunc < __paravirt_nop_end)
/* If the operation is a nop, then nop the callsite */
ret = paravirt_patch_nop();
else if (type == PARAVIRT_PATCH(iret) ||
@@ -521,7 +571,7 @@ struct paravirt_ops paravirt_ops = {
 
.patch = native_patch,
.banner = default_banner,
-   .arch_setup = paravirt_nop,
+   .arch_setup = pv_nop_arch_setup,
.memory_setup = machine_specific_memory_setup,
.get_wallclock = native_get_wallclock,
.set_wallclock = native_set_wallclock,
@@ -577,7 +627,7 @@ struct paravirt_ops paravirt_ops = {
.setup_boot_clock = setup_boot_APIC_clock,
.setup_secondary_clock = setup_secondary_APIC_clock,
 #endif
-   .set_lazy_mode = paravirt_nop,
+   .set_lazy_mode = pv_nop_set_lazy_mode,
 
.pagetable_setup_start = native_pagetable_setup_start,
.pagetable_setup_done = native_pagetable_setup_done,
@@ -587,24 +637,24 @@ struct paravirt_ops paravirt_ops = {
.flush_tlb_single = native_flush_tlb_single,
.flush_tlb_others = native_flush_tlb_others,
 
-   .alloc_pt = paravirt_nop,
-   .alloc_pd = paravirt_nop,
-   .alloc_pd_clone = paravirt_nop,
-   .release_pt = paravirt_nop,
-   .release_pd = paravirt_nop,
+   .alloc_pt = pv_nop_alloc_pt,
+   .alloc_pd = pv_nop_alloc_pd,
+   .alloc_pd_clone = pv_nop_alloc_pd_clone,
+   .release_pt = pv_nop_release_pt,
+   .release_pd = pv_nop_release_pd,
 
.set_pte = native_set_pte,
.set_pte_at = native_set_pte_at,
.set_pmd = native_set_pmd,
-   .pte_update = paravirt_nop,
-   .pte_update_defer = paravirt_nop,
+   .pte_update = pv_nop_pte_update,
+   .pte_update_defer = pv_nop_pte_update_defer,
 
.ptep_get_and_clear = native_ptep_get_and_clear,
 
 #ifdef CONFIG_HIGHPTE
.kmap_atomic_pte = native_kmap_atomic_pte,
 #else
-   .kmap_atomic_pte = paravirt_nop,
+   .kmap_atomic_pte = pv_nop_kmap_atomic_pte,
 #endif
 
 #ifdef CONFIG_X86_PAE
@@ -627,11 +677,11 @@ struct paravirt_ops paravirt_ops = {
.irq_enable_sysexit = native_irq_enable_sysexit,
.iret = native_iret,
 
-   .dup_mmap = paravirt_nop,
-   .exit_mmap = paravirt_nop,
-   .activate_mm = paravirt_nop,
-
-   .startup_ipi_hook = paravirt_nop,
+   .dup_mmap = pv_nop_dup_mmap,
+   .exit_mmap = pv_nop_exit_mmap,
+   .activate_mm = pv_nop_activate_mm,
+
+   .startup_ipi_hook = pv_nop_startup_ipi_hook,
 };
 
 /*
diff -r 930fff55070e arch/i386/kernel/vmlinux.lds.S
--- a/arch/i386/kernel/vmlinux.lds.SFri Mar 1

Fwd: Backport RSDL v0.31: Updated RSDL 0.31 backport for 2.6.18.8 & Debian etch x86_64 RSDL 0.31 kernels availability

2007-03-16 Thread Veronique & Vincent
Hi all,

Here is are the backported RSDL scheduler for a 2.6.18.8 kernel and now also 
for 2.6.19.7 kernel.

This release includes the original backported 2.6.20.x RSDL 0.31 patch and also 
has few cleanups to remove unecessary debian make-kpkg files.
This update also includes a small patch to both 2.6.18 and 2.6.19 kernels 
(http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.20.y.git;a=commit;h=d499ac7a3681e270074e880879d0e0a5ad0849fa).
  Would a new stable release of 2.6.18 and .19 including that patch be worth it?

The provided kernels (only for Debian Etch x86_64) might be usefull to make 
load comparisons between Vanilla vs CK's RSDL scheduler.

Althoug I've been able to backport properly RSDL to a 2.6.18 kernel I'm 
currently having a bug in the 2.6.19 backported patch.  Help finding that bug 
would really be appreciated.  2.6.19 BUG in action picture available at 
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/2.6.19/bug-2.6.19-rsdl-031.jpg

Official CK RSDL patches available at:
http://ck.kolivas.org/patches/staircase-deadline/
Latest CK RSDL 0.31 announcement:  http://lkml.org/lkml/2007/3/16/173

Again, nice work CK!

-

2.6.18.8 backport RSDL 0.31 patch (contains already Ingo's SMT scheduler fix):
PATCH: http://linux-dev.qc.ec.gc.ca/kernel/rsdl/2.6.18.8-rsdl-0.31.patch
CONFIG: 
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/2.6.18/CONFIG-2.6.18-003
Notes:  Didn't encountered any problems yet.

2.6.19.7 backport RSDL 0.31 patch (contains already Ingo's SMT scheduler fix):
PATCH: http://linux-dev.qc.ec.gc.ca/kernel/rsdl/2.6.19.7-rsdl-0.31.patch
CONFIG: 
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/2.6.19/CONFIG-2.6.19-001
Notes:  There is currently a bug in the backported 2.6.19 RSDL 0.31 kernel.  
I've attached a picture of the kernel BUG: call.  Help to trace this bug would 
really be appreciated since I'm not that much a kernel hacker yet!  Also note 
that libata PATA is not enabled and that GFS + DLM is enabled.

2.6.20.3 uses the official RSDL 0.31 from CK:
CONFIG: 
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/2.6.20/CONFIG-2.6.20-001
Notes:  Uses the official RSDL 0.31 patch from CK.  Libata PATA not enabled, 
GFS + DLM enabled, KVM enabled.

Ingo's SMT scheduler fix (already included in stable 2.6.20.3):
http://linux-dev.qc.ec.gc.ca/kernel/rsdl/fix_SMT_scheduler_bug.patch


Pre-compiled Debian Etch 4.0 x86_64 kernels:
2.6.18.8 + SMT sched fix + RSDL 0.31:
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/2.6.18/linux-image-2.6.18.8-003-rsdl-0.31-amd64-envcan_2.6.18.8-003-rsdl-0.31_amd64.deb
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/2.6.18/linux-headers-2.6.18.8-003-rsdl-0.31-amd64-envcan_2.6.18.8-003-rsdl-0.31_amd64.deb

2.6.18.8 + SMT sched fix:
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/2.6.18/linux-image-2.6.18.8-003p1-amd64-envcan_2.6.18.8-003p1_amd64.deb
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/2.6.18/linux-headers-2.6.18.8-003p1-amd64-envcan_2.6.18.8-003p1_amd64.deb

2.6.18.8 (Vanilla):
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/2.6.18/linux-image-2.6.18.8-003-amd64-envcan_2.6.18.8-003_amd64.deb
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/2.6.18/linux-headers-2.6.18.8-003-amd64-envcan_2.6.18.8-003_amd64.deb

2.6.19.7 + SMT sched fix + RSDL 0.31:
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/2.6.19/linux-image-2.6.19.7-001-rsdl-0.31-amd64-envcan_2.6.19.7-001-rsdl-0.31_amd64.deb
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/2.6.19/linux-headers-2.6.19.7-001-rsdl-0.31-amd64-envcan_2.6.19.7-001-rsdl-0.31_amd64.deb

2.6.19.7 + SMT sched fix:
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/2.6.19/linux-image-2.6.19.7-001p1-amd64-envcan_2.6.19.7-001p1_amd64.deb
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/2.6.19/linux-headers-2.6.19.7-001p1-amd64-envcan_2.6.19.7-001p1_amd64.deb

2.6.20.3 + RSDL 0.31:
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/2.6.20/linux-image-2.6.20.3-001-rsdl-0.31-amd64-envcan_2.6.20.3-001-rsdl-0.31_amd64.deb
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/2.6.20/linux-headers-2.6.20.3-001-rsdl-0.31-amd64-envcan_2.6.20.3-001-rsdl-0.31_amd64.deb

2.6.20.3 (Vanilla)
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/2.6.20/linux-image-2.6.20.3-001-amd64-envcan_2.6.20.3-001_amd64.deb
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/2.6.20/linux-headers-2.6.20.3-001-amd64-envcan_2.6.20.3-001_amd64.deb

Comments are welcomed!

Vincent Fortier
Informatique
Environnement Canada


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.31

2007-03-16 Thread Al Boldi
Con Kolivas wrote:
> Here are full patches for rsdl 0.31 for various base kernels. A full
> announce with a fresh -mm series will follow...
>
> http://ck.kolivas.org/patches/staircase-deadline/2.6.20.3-rsdl-0.31.patch

Thanks!  It looks much better now.

With X nice'd at -10, and 11 hogs loading the cpu, interactivity looks good 
until the default timeslice/quota is exhausted and slows down.  Maybe 
adjusting this according to nice could help.

It may also be advisable to fix latencies according to nice, and adjust 
timeslices instead.  This may help scaleability a lot, as there are some 
timing sensitive apps that may crash under high load.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


New kernel mouse recognition problem

2007-03-16 Thread Victor Fernandes
Dear kernel gurus,

I have a long experience with linux but not at the kernel level, so my
apologies if this post is not appropriate for the list, but it seemed to
me to be the only possible one to post my question.

Obviously I've also tried to find the solution on the archives (and more)
but found nothing appropriate.

Problem: It appears that the new kernels, I've actually kernel 2.6.17-5
(Mandriva 2007) installed, and tested others on the 2.6.x range, do not
recognition my "Track Point" mouse anymore. I have on the boot logs
(syslog) the following message: "logips2pp: Detected unknown logitech
mouse model 0".

The same system with a kernel 2.6.12-12 (Mandriva 10.1) worked properly.

Is my mouse too old? Is this a bug? Are there any possibilities to solve
the problem?

Please Cc me directly on your replies because I'm not subscribe to the
list, I hope this is OK. Obviously I'm more then happy to provide any
required information.

Thank you,

Victor Fernandes


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 17/26] Xen-paravirt_ops: Add nosegneg capability to the vsyscall page notes

2007-03-16 Thread Jeremy Fitzhardinge
Roland McGrath wrote:
> This should be:
>
> NOTE_KERNELCAP_BEGIN(1, 1)
> NOTE_KERNELCAP(0, "nosegneg")
> NOTE_KERNELCAP_END
>
> i.e. 1->0 in the "bit" member.  (Note the ld.so.conf.d file must have the
> matching bit number for ldconfig-based lookups to do the right thing.)
> Or else:
>
> NOTE_KERNELCAP_BEGIN(1, 2)
> NOTE_KERNELCAP(0, "nosegneg")
> NOTE_KERNELCAP_END
>
> i.e. 1->2 in the "mask" member.  (The mask value should be 1<   

Thanks Roland.  I've never really understood this stuff, and I just
copied this cargo-cultishly.

I'm not quite sure what you're suggesting here though.  Do you mean one of:

NOTE_KERNELCAP_BEGIN(1, 1)
NOTE_KERNELCAP(0, "nosegneg")
NOTE_KERNELCAP_END

or

NOTE_KERNELCAP_BEGIN(1, 2)
NOTE_KERNELCAP(1, "nosegneg")
NOTE_KERNELCAP_END

is the correct thing to use?

> Some pre-release glibc's (before 2.4) had a bug in the code that parses
> this, and would crash parsing the correct note.  Using the wrong bit value
> with nonmatching mask worked around this.  IIRC, no glibc release ever
> included the buggy version of the code.  In nonbuggy glibc, the mismatched
> value causes the "nosegneg" to be omitted from the directory search (under
> LD_LIBRARY_PATH and default directories), though ldconfig-based lookups
> will work (the most common case).
>   

Are you saying that one of the corrected forms might cause old glibcs to
crash, or just ignore nosegneg?

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


decrease L2 cache size used by Linux?

2007-03-16 Thread Al Boldi
Guerreiro da Luz wrote:
> Am benchmarking dumb matrix multiplication in trying to perceive
> performance drop in case when matrix cannot fit in L2 cache.  However,
> on my machine L2 cache is large - 2MB, so 512x512 matrix of double
> numbers is needed to fill the cache, and in that case multiplication
> is taking rather long time.  So I'm wondering is there a way to
> control (decrease) L2 cache size used by Linux?  Apologies if question
> inappropriate for the list.

You may want to turn off randomization,
 
echo 0 > /proc/sys/kernel/randomize_va_space
echo 1 > /proc/sys/vm/vdso_enabled

And see if that helps.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/18] Make common x86 arch area for i386 and x86_64 - Take 2

2007-03-16 Thread Linus Torvalds


On Fri, 16 Mar 2007, Christoph Lameter wrote:
> 
> Yes he has already explained it and I am well aware of the difficulties 
> on 32 bit. -> linux-mm archives.

Stop pointing to archives.

If you cannot give a http pointer to a specific thread, don't bother with 
the "please real the list" thing AT ALL.

And I'm sorry, "we decided this on linux-mm" doesn't cut it as an 
explanation _or_ as a "it's already been decided". Many relevant people 
simply aren't on that mailing list.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


ASR: Address Space Randomization (was: [RFC, PATCH] Fixup COMPAT_VDSO to work with CONFIG_PARAVIRT)

2007-03-16 Thread Al Boldi
ebiederm wrote:
> I'm tempted to rant on the pure insanity of address space randomization
> but that is a whole other issue...

Please do rant; all I can see asr brings is one big performance hit.

Of course, it's not enough to just attack this at the kernel, but glibc has 
to play accordingly as well.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hrtimer: prevent overrun DoS in hrtimer_forward()

2007-03-16 Thread Thomas Gleixner
On Fri, 2007-03-16 at 12:43 -0800, Andrew Morton wrote:
> On Wed, 14 Mar 2007 11:00:12 +0100 Thomas Gleixner <[EMAIL PROTECTED]> wrote:
> 
> > rtimer_forward() does not check for the possible overflow of
> > timer->expires. This can happen on 64 bit machines with large interval
> > values and results currently in an endless loop in the softirq because
> > the expiry value becomes negative and therefor the timer is expired all
> > the time.
> > 
> > Check for this condition and set the expiry value to the max. expiry
> > time in the future.
> > 
> > The fix should be applied to stable kernel series as well.
> > 
> > Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED],de>
> > 
> > diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> > index ec4cb9f..5e7122d 100644
> > --- a/kernel/hrtimer.c
> > +++ b/kernel/hrtimer.c
> > @@ -644,6 +644,12 @@ hrtimer_forward(struct hrtimer *timer, k
> > orun++;
> > }
> > timer->expires = ktime_add(timer->expires, interval);
> > +   /*
> > +* Make sure, that the result did not wrap with a very large
> > +* interval.
> > +*/
> > +   if (timer->expires.tv64 < 0)
> > +   timer->expires = ktime_set(KTIME_SEC_MAX, 0);
> >  
> > return orun;
> >  }
> 
> kernel/hrtimer.c: In function 'hrtimer_forward':
> kernel/hrtimer.c:652: warning: overflow in implicit constant conversion
> 
> problem is, KTIME_SEC_MAX is 9,223,372,036 and ktime_set() takes a `long'.

Stupid me :(

> This?
> 
> --- a/include/linux/ktime.h~ktime_set-fix-arg-type
> +++ a/include/linux/ktime.h
> @@ -72,13 +72,13 @@ typedef union {
>   *
>   * Return the ktime_t representation of the value
>   */
> -static inline ktime_t ktime_set(const long secs, const unsigned long nsecs)
> +static inline ktime_t ktime_set(const s64 secs, const unsigned long nsecs)
>  {
>  #if (BITS_PER_LONG == 64)
>   if (unlikely(secs >= KTIME_SEC_MAX))
>   return (ktime_t){ .tv64 = KTIME_MAX };
>  #endif
> - return (ktime_t) { .tv64 = (s64)secs * NSEC_PER_SEC + (s64)nsecs };
> + return (ktime_t) { .tv64 = secs * NSEC_PER_SEC + (s64)nsecs };
>  }
>  
>  /* Subtract two ktime_t variables. rem = lhs -rhs: */
> _
> 
> I worry about that `secs >= KTIME_SEC_MAX' comparison in there, too.  Both
> operands are signed.

I'd prefer this one: The maximum seconds value we can handle on 32bit is
LONG_MAX.

diff --git a/include/linux/ktime.h b/include/linux/ktime.h
index c68c7ac..248305b 100644
--- a/include/linux/ktime.h
+++ b/include/linux/ktime.h
@@ -57,7 +57,11 @@ typedef union {
 } ktime_t;
 
 #define KTIME_MAX  ((s64)~((u64)1 << 63))
-#define KTIME_SEC_MAX  (KTIME_MAX / NSEC_PER_SEC)
+#if (BITS_PER_LONG == 64)
+# define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC)
+#else
+# define KTIME_SEC_MAX LONG_MAX
+#endif
 
 /*
  * ktime_t definitions when using the 64-bit scalar representation:


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cleanpatch: a script to clean up stealth whitespace added by a patch

2007-03-16 Thread H. Peter Anvin
This script is a companion to the "cleanfile" script.  This cleans
up a patch in unified diff format *before* it is applied.  Note that
the empty lines at the end of file detection *requires* that the diff was
taken with at least one line of context around each hunk, or bad things
will happen.

This script cleans up various classes of stealth whitespace.  In
particular, it cleans up:

- Whitespace (spaces or tabs)before newline;
- DOS line endings (CR before LF);
- Space before tab (spaces are deleted or converted to tabs);
- Empty lines at end of file.

Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]>
---
 scripts/cleanpatch |  206 
 1 files changed, 206 insertions(+), 0 deletions(-)
 create mode 100755 scripts/cleanpatch

diff --git a/scripts/cleanpatch b/scripts/cleanpatch
new file mode 100755
index 000..a53f987
--- /dev/null
+++ b/scripts/cleanpatch
@@ -0,0 +1,206 @@
+#!/usr/bin/perl -w
+#
+# Clean a patch file -- or directory of patch files -- of stealth whitespace.
+# WARNING: this can be a highly destructive operation.  Use with caution.
+#
+
+use bytes;
+use File::Basename;
+
+#
+# Clean up space-tab sequences, either by removing spaces or
+# replacing them with tabs.
+sub clean_space_tabs($)
+{
+no bytes;  # Tab alignment depends on characters
+
+my($li) = @_;
+my($lo) = '';
+my $pos = 0;
+my $nsp = 0;
+my($i, $c);
+
+for ($i = 0; $i < length($li); $i++) {
+   $c = substr($li, $i, 1);
+   if ($c eq "\t") {
+   my $npos = ($pos+$nsp+8) & ~7;
+   my $ntab = ($npos >> 3) - ($pos >> 3);
+   $lo .= "\t" x $ntab;
+   $pos = $npos;
+   $nsp = 0;
+   } elsif ($c eq "\n" || $c eq "\r") {
+   $lo .= " " x $nsp;
+   $pos += $nsp;
+   $nsp = 0;
+   $lo .= $c;
+   $pos = 0;
+   } elsif ($c eq " ") {
+   $nsp++;
+   } else {
+   $lo .= " " x $nsp;
+   $pos += $nsp;
+   $nsp = 0;
+   $lo .= $c;
+   $pos++;
+   }
+}
+$lo .= " " x $nsp;
+return $lo;
+}
+
+$name = basename($0);
+
+foreach $f ( @ARGV ) {
+print STDERR "$name: $f\n";
+
+if (! -f $f) {
+   print STDERR "$f: not a file\n";
+   next;
+}
+
+if (!open(FILE, '+<', $f)) {
+   print STDERR "$name: Cannot open file: $f: $!\n";
+   next;
+}
+
+binmode FILE;
+
+# First, verify that it is not a binary file; consider any file
+# with a zero byte to be a binary file.  Is there any better, or
+# additional, heuristic that should be applied?
+$is_binary = 0;
+
+while (read(FILE, $data, 65536) > 0) {
+   if ($data =~ /\0/) {
+   $is_binary = 1;
+   last;
+   }
+}
+
+if ($is_binary) {
+   print STDERR "$name: $f: binary file\n";
+   next;
+}
+
+seek(FILE, 0, 0);
+
+$in_bytes = 0;
+$out_bytes = 0;
+
+@lines  = ();
+
+$in_hunk = 0;
+$err = 0;
+
+while ( defined($line = ) ) {
+   $in_bytes += length($line);
+
+   if (!$in_hunk) {
+   if ($line =~ /[EMAIL 
PROTECTED]@\s+\-([0-9]+),([0-9]+)\s+\+([0-9]+),([0-9]+)[EMAIL PROTECTED]@/) {
+   $minus_lines = $2;
+   $plus_lines = $4;
+   if ($minus_lines || $plus_lines) {
+   $in_hunk = 1;
+   @hunk_lines = ($line);
+   }
+   } else {
+   push(@lines, $line);
+   $out_bytes += length($line);
+   }
+   } else {
+   # We're in a hunk
+
+   if ($line =~ /^\+/) {
+   $plus_lines--;
+
+   $text = substr($line, 1);
+   $text =~ s/[ \t\r]*$//; # Remove trailing spaces
+   $text = clean_space_tabs($text);
+
+   push(@hunk_lines, '+'.$text);
+   } elsif ($line =~ /^\-/) {
+   $minus_lines--;
+   push(@hunk_lines, $line);
+   } elsif ($line =~ /^ /) {
+   $plus_lines--;
+   $minus_lines--;
+   push(@hunk_lines, $line);
+   } else {
+   print STDERR "$name: $f: malformed patch\n";
+   $err = 1;
+   last;
+   }
+
+   if ($plus_lines < 0 || $minus_lines < 0) {
+   print STDERR "$name: $f: malformed patch\n";
+   $err = 1;
+   last;
+   } elsif ($plus_lines == 0 && $minus_lines == 0) {
+   # End of a hunk.  Process this hunk.
+   my $i;
+   my $l;
+   my @h = ();
+   my $adj = 0;
+   my $done = 0;
+
+   for ($i = scalar(@hunk_lines)-1; $i > 0; $i--) {
+   $l = $hunk_lines[$i];
+   if (!$done && $l eq "+\n") {
+   $adj++; # Skip this line
+   } elsif ($l =~ /^[ +]/) {
+   $done = 1;
+ 

Re: [PATCH 2/9] Sched clock paravirt op fix.patch

2007-03-16 Thread Matt Mackall
On Tue, Mar 13, 2007 at 02:05:11PM -0700, Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
> > It depends -- under heavy network load you can spend a long time
> > just processing interrupts.
> 
> Well, in that case you probably don't want to charge them to the process
> which happens to be running at the time.

It's actually a good first-order approximation of the right thing to
do, as it will generally correlate with the userspace process
servicing that network load.

If not (for instance, with routing loads), then you'd basically expect
the charge to get spread around evenly in proportion to an
application's CPU usage.

The -rt kernel pushes most of the interrupt work off to threads, which
of course follow the same scheduling and accounting rules as
everything other thread.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_inic162x: kill double region requests

2007-03-16 Thread Nate Riffe
Tejun Heo said this (probably recently):
> Regions are requested twice during initialization causing the second
> one to fail.  This is regression introduced during iomap conversion.
> 
> Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
> ---
> Nate, this should fix it.  But LBA48 support is broken on the
> controller.

It does indeed allow the driver to enable the card.  The two attached
drives do appear in /proc/scsi/scsi now, but they do not seem to be
bound to major/minor numbers.  If I read the discussion correctly, I
should at least be able to write garbage to the disk and read
(possibly different) garbage back, but that is not the case.

I've included the complete dmesg output from a kernel without the
extra region requests I'm not sure what other information would be
useful at this juncture.

-Nate

Linux version 2.6.21-rc3 ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 
(prerelease) (Debian 4.1.1-21)) #1 SMP Thu Mar 15 23:25:19 EDT 2007
BIOS-provided physical RAM map:
sanitize start
sanitize end
copy_e820_map() start:  size: 0009e800 end: 
0009e800 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 0009e800 size: 1800 end: 
000a type: 2
copy_e820_map() start: 000e7000 size: 00019000 end: 
0010 type: 2
copy_e820_map() start: 0010 size: 1fefdc00 end: 
1fffdc00 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 1fffdc00 size: 2000 end: 
1c00 type: 3
copy_e820_map() start: 1c00 size: 0400 end: 
2000 type: 4
copy_e820_map() start: fffe7000 size: 00019000 end: 
0001 type: 2
 BIOS-e820:  - 0009e800 (usable)
 BIOS-e820: 0009e800 - 000a (reserved)
 BIOS-e820: 000e7000 - 0010 (reserved)
 BIOS-e820: 0010 - 1fffdc00 (usable)
 BIOS-e820: 1fffdc00 - 1c00 (ACPI data)
 BIOS-e820: 1c00 - 2000 (ACPI NVS)
 BIOS-e820: fffe7000 - 0001 (reserved)
0MB HIGHMEM available.
511MB LOWMEM available.
Entering add_active_range(0, 0, 131069) 0 entries of 256 used
Zone PFN ranges:
  DMA 0 -> 4096
  Normal   4096 ->   131069
  HighMem131069 ->   131069
early_node_map[1] active PFN ranges
0:0 ->   131069
On node 0 totalpages: 131069
  DMA zone: 32 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 4064 pages, LIFO batch:0
  Normal zone: 991 pages used for memmap
  Normal zone: 125982 pages, LIFO batch:31
  HighMem zone: 0 pages used for memmap
DMI 2.1 present.
ACPI: RSDP 000F6B00, 0014 (r0 PTLTD )
ACPI: RSDT 1FFFDD5D, 0028 (r1 PTLTDRSDT  0 PTL   100)
ACPI: FACP 1B8C, 0074 (r1 INTEL  SEATTLE2 19990830 PTL F4240)
ACPI: DSDT 1FFFDD85, 1E07 (r1  Intel  S2440BX0 MSFT  104)
ACPI: FACS 1FC0, 0040
ACPI: PM-Timer IO Port: 0x8008
Allocating PCI resources starting at 3000 (gap: 2000:dffe7000)
Built 1 zonelists.  Total pages: 130046
Kernel command line: root=/dev/hda1 ro 
Local APIC disabled by BIOS -- you can enable it with "lapic"
mapped APIC to d000 (0140a000)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 8192 bytes)
Detected 548.753 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 514244k/524276k available (1547k kernel code, 9480k reserved, 588k 
data, 200k init, 0k highmem)
virtual kernel memory layout:
fixmap  : 0xfff4f000 - 0xf000   ( 704 kB)
pkmap   : 0xff80 - 0xffc0   (4096 kB)
vmalloc : 0xe080 - 0xff7fe000   ( 495 MB)
lowmem  : 0xc000 - 0xdfffd000   ( 511 MB)
  .init : 0xc031d000 - 0xc034f000   ( 200 kB)
  .data : 0xc0282c7d - 0xc0316014   ( 588 kB)
  .text : 0xc010 - 0xc0282c7d   (1547 kB)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 1098.41 BogoMIPS (lpj=2196828)
Security Framework v1.0.0 initialized
SELinux:  Disabled at boot.
Capability LSM initialized
Mount-cache hash table entries: 512
CPU: After generic identify, caps: 0387f9ff     
 
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 512K
CPU serial number disabled.
CPU: After all inits, caps: 0383f9ff   0040  
 
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Compat vDSO mapped to e000.
Checking 'hlt' instruction... OK.
SMP alternatives: switching to UP code
Freeing SMP alternatives: 11k freed
ACPI: Core revision 20070126
ACPI: setting ELCR to 0200 (from 8e00)
CPU0: Intel Pentium III (Ka

Re: [patch 17/26] Xen-paravirt_ops: Add nosegneg capability to the vsyscall page notes

2007-03-16 Thread Roland McGrath
> > +NOTE_KERNELCAP_BEGIN(1, 1)
> > +NOTE_KERNELCAP(1, "nosegneg")
> > +NOTE_KERNELCAP_END

This should be:

NOTE_KERNELCAP_BEGIN(1, 1)
NOTE_KERNELCAP(0, "nosegneg")
NOTE_KERNELCAP_END

i.e. 1->0 in the "bit" member.  (Note the ld.so.conf.d file must have the
matching bit number for ldconfig-based lookups to do the right thing.)
Or else:

NOTE_KERNELCAP_BEGIN(1, 2)
NOTE_KERNELCAP(0, "nosegneg")
NOTE_KERNELCAP_END

i.e. 1->2 in the "mask" member.  (The mask value should be 1

Re: Summary of resource management discussion

2007-03-16 Thread Paul Jackson
Herbert wrote:
> looks good to me, except for the potential issue with
> the double indirection introducing too much overhear

It's not the indirection count that I worry about.

It's the scalability of the locking.  We must avoid as
much as possible adding any global locks on key code paths.
This means:
 1) be reluctant to add them to fork/exit
 2) just RCU locks on per-job (or finer grain) data when on
the normal page allocation path
 3) nothing outside the current task context for the normal
task scheduling code path.

A global lock on the wrong code path is fatal for scaling
big NUMA boxes.

... now whether or not that is an issue here, I don't claim
to know.   I'm just worried that it could be.

Atomic data, such as global counters, is just as bad.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access

2007-03-16 Thread Stephen Hemminger
All this discussion is well and good, but I suspect there is a driver setup
problem where the interrupt isn't being handled properly. Please retest with
the latest version of skge driver (I just pushed patches to netdev about 2min 
ago).
One patch changes to disable IRQ's from device for packets until device
is brought up.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] kernel/pid.c pid allocation wierdness

2007-03-16 Thread Eric W. Biederman
William Lee Irwin III <[EMAIL PROTECTED]> writes:
>
> On Fri, Mar 16, 2007 at 07:04:28AM -0600, Eric W. Biederman wrote:
>> Grr.  s/patricia tree/fib tree/.  We use that in the networking for
>> the forwarding information base and I got mis-remembered it.  Anyway
>> the interesting thing with the binary version of radix tree is that
>> path compression is well defined.  Path compression when you have
>> multi-way branching is much more difficult.
>
> Path compression isn't a big deal for multiway branching. I've usually
> done it by aligning nodes and or'ing the number of levels to skip into
> the lower bits of the pointer.

Hmm.  I guess what I have seen it that it was simply  more difficult
because there were fewer opportunities the bigger the branching factor
but I haven't looked at it very closely.

> On Fri, Mar 16, 2007 at 07:04:28AM -0600, Eric W. Biederman wrote:
>> Sure.  One of the reasons to be careful with switching data
>> structures.  Currently the hash tables typically operate at 10:1
>> unsued:used entries.  4096 entries and 100 processes.
>
> That would be 40:1, which is "worse" in some senses. That's not
> going to fly well when pid namespaces proliferate.

Agreed, currently the plan it to add an namespace parameter to hash table
compares during lookups.  Allocating hash tables at run time is almost
impossible  to do reliably because of the multiple page allocations.

> On Fri, Mar 16, 2007 at 07:04:28AM -0600, Eric W. Biederman wrote:
>> The current work actually focuses on changing the code so we reduce
>> the total number of hash table looks ups, but persistently storing
>> struct pid pointers instead of storing a pid_t.  This has a lot
>> of benefits when it comes to implementing a pid namespace but the
>> secondary performance benefit is nice as well.
>
> I can't quite make out what you mean by all that. struct pid is already
> what's in the hashtable.

Yes.  But I have given it a life of it's own as well.  Which means instead
of caching a pid_t value in a long lived data structure we can hold 
a struct pid *.  So that means we have fewer total hash table look
ups.

> On Fri, Mar 16, 2007 at 07:04:28AM -0600, Eric W. Biederman wrote:
>> Although my preliminary concern was the increase in typical list
>> traversal length during lookup.  The current hash table typically does
>> not have collisions so normally there is nothing to traverse.
>
> Define "normally;" 1 threads and/or processes can be standard for
> some affairs.

When I did a quick survey of systems I could easily find everything
was much lower than.  I wasn't been able to find those setups in my
quick survey.  I was looking for systems with long hash chains to
justify a data structure switch especially systems that needed to
push up the default pid limit, but I didn't encounter them.

So that said to me the common case was well handled by the current
setup.  Especially where even at 1 we only have normal hash chain
lengths of 3 to 4 (3 to 5?).  I did a little modeling and our hash
function was good enough that it generally gave a good distribution of
pid values across the buckets.

My memory is something like the really nasty cases only occur when
we start pushing /proc/sys/kernel/pid_max above it's default at 32768.

Our worst case has pid hash chains of 1k entries which clearly sucks.

> William Lee Irwin III <[EMAIL PROTECTED]> writes:
>>> RCU'ing radix trees is trendy but the current implementation needs a
>>> spinlock where typically radix trees do not need them for RCU. I'm
>>> talking this over with others interested in lockless radix tree
>>> algorithms for reasons other than concurrency and/or parallelism.
>
> On Fri, Mar 16, 2007 at 07:04:28AM -0600, Eric W. Biederman wrote:
>> Sure.  I was thinking of the general class of data structures not the
>> existing kernel one.  The multi-way branching and lack of comparisons
>> looks interesting as a hash table replacement.  In a lot of cases
>> storing hash values in a radix tree seems a very interesting
>> alternative to a traditional hash table (and in that case the keys
>> are randomly distributed and can easily be made dense).   For pids
>> hashing the values first seems like a waste, and 
>
> Comparisons vs. no comparisons is less interesting than cachelines
> touched. B+ would either make RCU inconvenient or want comparisons by
> dereferencing, which raises the number of cachelines touched.

Agreed.  Hmm.  I didn't say that too well, I was thinking of the lack
of comparisons implying fewer cache line touches.

> William Lee Irwin III <[EMAIL PROTECTED]> writes:
>>> I'd say you already have enough evidence to motivate a change of data
>>> structure.
>
> On Fri, Mar 16, 2007 at 07:04:28AM -0600, Eric W. Biederman wrote:
>> I have enough to know that a better data structure could improve
>> things.  Getting something that is clearly better than what
>> we have now is difficult.  Plus I have a lot of other patches to
>> coordinate.  I don't think the current d

Re: sky2 PHY setup

2007-03-16 Thread Stephen Hemminger
On Fri, 16 Mar 2007 14:36:45 -0600
Rob Sims <[EMAIL PROTECTED]> wrote:

> On Fri, Mar 16, 2007 at 09:59:32AM -0700, Stephen Hemminger wrote:
> > On Fri, 16 Mar 2007 01:29:12 +0100
> > Thomas Glanzmann <[EMAIL PROTECTED]> wrote:
> > 
> > > Hello Stephen,
> > > 
> > > > yesterday I pulled from Linus tree because I saw the sky2 updated and I
> > > > tried to break it but it seems that my problems are gone. I let you know
> > > > if anything pops up in the future.
> > > 
> > > bad news. I today tried the sky2 driver which is in Linus Kernel Tree
> > > (HEAD) on a machine with very high network load and it stopped working
> > > without any kernel messages after doing a flawless job under high load
> > > for 5 hours. My watchdog rebooted the machine after 500 seconds. ;-(
> > > 
> > > Thomas
> > 
> > I have run for 2+ days under load without problems. It is hard to
> > reproduce or do much about your problem without more info.
> 
> Are there some debug hooks that can be activated?  My sky2 stops
> responding (very light load) about twice a day.  The netdev watchdog
> notices after a while and is able to reactivate the interface:
> 
> Mar 15 13:28:12 btd kernel: NETDEV WATCHDOG: eth0: transmit timed out
> Mar 15 13:28:12 btd kernel: sky2 eth0: tx timeout
> Mar 15 13:28:12 btd kernel: sky2 eth0: transmit ring 458 .. 435 report=458 
> done=458
> Mar 15 13:28:12 btd kernel: sky2 eth0: disabling interface
> Mar 15 13:28:12 btd kernel: sky2 eth0: enabling interface
> Mar 15 13:28:12 btd kernel: sky2 eth0: ram buffer 48K
> Mar 15 13:28:15 btd kernel: sky2 eth0: Link is up at 1000 Mbps, full duplex, 
> flow control both

Use ethtool -S to if there are any pause frames, etc. See if frames are
still making it into PHY statistics but not being received.

Use ethtool -d to dump registers. Need current version of ethtool with decode 
logic.

Then look for things like is Ram buffer read/write pointer changing?

Is GMAC stuck in pause:

Normal is:
GMAC 1
Status   0x5010  (see GM_GPSR_XXX in sky2.h)
Control  0x1800

Stuck is
GMAC 1
Status   0x5810 (or 0x5A10)

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21-rc4

2007-03-16 Thread Randy Dunlap
On Fri, 16 Mar 2007 09:33:54 -0700 (PDT) Linus Torvalds wrote:

> 
> I pushed out the -git trees yesterday, but then got distracted, so the 
> patches and tar-balls and the announcement got delayed until this morning. 
> Oops. I'm a scatter-brain.

allmodconfig on i386:

WARNING: "default_idle" [arch/i386/kernel/apm.ko] undefined!
WARNING: "machine_real_restart" [arch/i386/kernel/apm.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RSDL-mm 4/6] sched: dont renice kernel threads

2007-03-16 Thread Con Kolivas
On Saturday 17 March 2007 02:14, Chris Friesen wrote:
> Con Kolivas wrote:
> > The practice of renicing kernel threads to negative nice values is of
> > questionable benefit at best, and at worst leads to larger latencies when
> > kernel threads are busy on behalf of other tasks.
>
> What about the priority implications of the renicing?  It seems a bit
> iffy letting kernel threads compete for cpu time on an equal basis with
> your default shell.

Lots of things we do because we just assume they're a good idea without any 
evidence. Renicing kernel threads was always considered a good idea on this 
basis. I'm certain noone has ever proven that it's a good thing though. 
Either way, the latest version of rsdl is robust enough that it works fine 
with reniced kernel threads if you still believe that's advantageous. This is 
definitely open for discussion/opinion.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sky2 PHY setup

2007-03-16 Thread Rob Sims
On Fri, Mar 16, 2007 at 09:59:32AM -0700, Stephen Hemminger wrote:
> On Fri, 16 Mar 2007 01:29:12 +0100
> Thomas Glanzmann <[EMAIL PROTECTED]> wrote:
> 
> > Hello Stephen,
> > 
> > > yesterday I pulled from Linus tree because I saw the sky2 updated and I
> > > tried to break it but it seems that my problems are gone. I let you know
> > > if anything pops up in the future.
> > 
> > bad news. I today tried the sky2 driver which is in Linus Kernel Tree
> > (HEAD) on a machine with very high network load and it stopped working
> > without any kernel messages after doing a flawless job under high load
> > for 5 hours. My watchdog rebooted the machine after 500 seconds. ;-(
> > 
> > Thomas
> 
> I have run for 2+ days under load without problems. It is hard to
> reproduce or do much about your problem without more info.

Are there some debug hooks that can be activated?  My sky2 stops
responding (very light load) about twice a day.  The netdev watchdog
notices after a while and is able to reactivate the interface:

Mar 15 13:28:12 btd kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar 15 13:28:12 btd kernel: sky2 eth0: tx timeout
Mar 15 13:28:12 btd kernel: sky2 eth0: transmit ring 458 .. 435 report=458 
done=458
Mar 15 13:28:12 btd kernel: sky2 eth0: disabling interface
Mar 15 13:28:12 btd kernel: sky2 eth0: enabling interface
Mar 15 13:28:12 btd kernel: sky2 eth0: ram buffer 48K
Mar 15 13:28:15 btd kernel: sky2 eth0: Link is up at 1000 Mbps, full duplex, 
flow control both

This machine is a Core2 Duo e6700, and the interface is:
03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E 
Gigabit Ethernet Controller (rev 20)
to 1 Gb hub.

On a Pentium 4, with:
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E 
Gigabit Ethernet Controller (rev 15)
I have no issues, but with a very light network load, 100 Mb/s hub..

Each machine has two identical interfaces; only one has a cable in it.

Both machines can be used for testing/debug.
-- 
Rob


signature.asc
Description: Digital signature


Re: oops in __nodemgr_remove_host_dev (was Re: Ooops with suspend to RAM)

2007-03-16 Thread Stefan Richter
Ismail Dönmez wrote:
> On Thursday 15 March 2007 02:08:43 Stefan Richter wrote:
> [...]
>> Ismail, if you have the opportunity, the next thing you could test would
>> be to unload eth1394 explicitly before ohci1394 on 2.6.21-rc3. This
>> would _not_ oops according to my observation.
> 
> On a clean reboot it works as expected ;
> 
> southpark cartman # rmmod eth1394
> southpark cartman # rmmod ohci1394
> southpark cartman #
> 
> No oops.

I now tested 2.6.20-rc4 with the following two commits reverted:

43cb76d91ee85f579a69d42bc8efc08bac560278
"Network: convert network devices to use struct device instead of
class_device"

40cf67c5fcc513406558c01b91129280208e57bf
"Driver core: remove class_device_rename"

I can now unload ohci1394 again while eth1394 is loaded. The reverting
patch is available at
http://me.in-berlin.de/~s5r6/linux1394/work-in-progress/revert-network-convert-network-devices-to-use-struct-device-instead-of-class_device.patch
(The server may be briefly down tonight and sometime during tomorrow.)

Next thing to do: Find a minimal fix which keeps Greg's net conversions.
-- 
Stefan Richter
-=-=-=== --== =
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/18] Make common x86 arch area for i386 and x86_64 - Take 2

2007-03-16 Thread Christoph Lameter
On Fri, 16 Mar 2007, Martin Bligh wrote:

> For starters, you can't do that sparse a mapping on a 32 bit system.
> I'll let Andy explain the rest of it.

Yes he has already explained it and I am well aware of the difficulties 
on 32 bit. -> linux-mm archives.

> "the agreement"? So Andy agreed to taking it out? Or you and Kame did?

Yes Andy and lots of others. Big discussion that you seem to want me to 
repeat here.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.31

2007-03-16 Thread Con Kolivas
On Saturday 17 March 2007 02:34, Mike Galbraith wrote:
> On Sat, 2007-03-17 at 00:40 +1100, Con Kolivas wrote:
> > Here are full patches for rsdl 0.31 for various base kernels. A full
> > announce with a fresh -mm series will follow...
> >
> > http://ck.kolivas.org/patches/staircase-deadline/2.6.20.3-rsdl-0.31.patch
> > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0.
> >31.patch
> > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.31
> >.patch
>
> It still has trouble with the x/gforce vs two niced encoders scenario.
> The previously reported choppiness is still present.
>
> I suspect that x/gforce landing in the expired array is the trouble, and
> that this will never be smooth without some kind of exemption.  I added
> some targeted unfairness to .30, and it didn't help much at all.
>
> Priorities going all the way to 1 were a surprise.

It wasn't going to change that case without renicing X. I said that from the 
start to maintain fairness it's the only way to keep a fair design, and give 
more cpu to X. The major difference in this one is the ability to run 
different nice values without killing the latency of the relatively niced 
ones.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/18] Make common x86 arch area for i386 and x86_64 - Take 2

2007-03-16 Thread Martin Bligh

Christoph Lameter wrote:

On Fri, 16 Mar 2007, Martin Bligh wrote:


You have to do some sort of lookup anyway, and Andy seemed to have them
all folded into one.


What lookup would you need to do? On x86_64 even the TLB use is 
hidden by the existing 2M entries for 1-1 mappings.



Or are you trying to avoid this by going to back to the crud we had
in 2.4 where we pretend mem_map is one big array, indexed by pfn with
huge sparsely mapped holes in it?


Yes that the advanced way of doing it rather than adding useless custom 
lookups.


For starters, you can't do that sparse a mapping on a 32 bit system.
I'll let Andy explain the rest of it.


Would be nice to work out (and document somewhere) what the pros and
cons of virtual memmap vs sparsemem were - ISTR one of the arguments
was extremely sparsely layed out machines, and you needed sparsemem
for that. But right now we have 3 solutions, which is not a good
situation.


Please read my posts to linux-mm on that subject. We discussed it last 
year in detail and the agreement was that the sparsemem crud needs to be 
taken out. Kame-san posted patches to do that.


"the agreement"? So Andy agreed to taking it out? Or you and Kame did?

M.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/18] Make common x86 arch area for i386 and x86_64 - Take 2

2007-03-16 Thread Dave Hansen
On Fri, 2007-03-16 at 13:15 -0700, Christoph Lameter wrote:
> On Fri, 16 Mar 2007, Andi Kleen wrote:
> > > x86_64 is going to acquire more functionality that will not be available 
> > > for i386. We plan f.e. to add virtual memmap support for x86_64. Virtual 
> > 
> > What advantage would that have over the current setup?
> > We already should handle holes between nodes reasonably efficiently
> > and with nonlinear memory even holes inside nodes shouldn't be a problem.
> 
> It is primarily a performance improvement since the sparsemem table 
> lookups would no longer be necessary and it also streamlines other 
> frequent cacheline uses. These page -> page_struct and vice versa 
> operations are key to the performance of various subsystem among them 
> the slab allocator.

Hi Christoph,

Yeah, those are horribly common operations.  But, have we actually
quantified how sparsemem hurts here?  I know that it look on the surface
like those lookups should kill you but, as far as I can tell, we've
never been able to show that they actually do.  We were worried on the
NUMAQ that we would see some performance regressions, but it somehow
slightly outperformed discontigmem.

Do you have any hard data that shows vmemmap killing sparsemem on some
of your workloads?  I'd love to try and fix some of the issues if we can
dig them up.

One of the reasons I really like sparsemem is that ports to new
architectures are something like 100 lines of code, including all of the
#defines.  It's really ridiculously easy to do.  One of the things that
I really worry about for vmemmap implementations is how complicated they
get to implement.  The fact that we can't easily do it for both i386 and
x86_64 at the same time speaks to this a bit.

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/18] Make common x86 arch area for i386 and x86_64 - Take 2

2007-03-16 Thread Christoph Lameter
On Fri, 16 Mar 2007, David Miller wrote:

> From: Christoph Lameter <[EMAIL PROTECTED]>
> Date: Fri, 16 Mar 2007 13:48:58 -0700 (PDT)
> 
> > Please read my posts to linux-mm on that subject. We discussed it last 
> > year in detail and the agreement was that the sparsemem crud needs to be 
> > taken out. Kame-san posted patches to do that.
> 
> Please don't do that, sparsemem works very well on sparc64 and I
> like the flexibility it gives me.

I am not sure what flexibility you are talking about? The modification are 
to make sparsemem support virtual memmap as one option. See Kame-san's 
posts. There is no regression here its just cutting out the overhead of 
sparsmem from within.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/18] Make common x86 arch area for i386 and x86_64 - Take 2

2007-03-16 Thread David Miller
From: Christoph Lameter <[EMAIL PROTECTED]>
Date: Fri, 16 Mar 2007 13:56:13 -0700 (PDT)

> On Fri, 16 Mar 2007, David Miller wrote:
> 
> > From: Christoph Lameter <[EMAIL PROTECTED]>
> > Date: Fri, 16 Mar 2007 13:48:58 -0700 (PDT)
> > 
> > > Please read my posts to linux-mm on that subject. We discussed it last 
> > > year in detail and the agreement was that the sparsemem crud needs to be 
> > > taken out. Kame-san posted patches to do that.
> > 
> > Please don't do that, sparsemem works very well on sparc64 and I
> > like the flexibility it gives me.
> 
> I am not sure what flexibility you are talking about? The modification are 
> to make sparsemem support virtual memmap as one option. See Kame-san's 
> posts. There is no regression here its just cutting out the overhead of 
> sparsmem from within.

I would really appreciate a posting that explains what exactly is
going on being sent to linux-arch, this is the first time I myself am
even aware of this idea and work.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   >