date:20070414

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-14 Thread hui

On Sun, Apr 15, 2007 at 01:27:13PM +1000, Con Kolivas wrote:
...
> Now that you're agreeing my direction was correct you've done the usual Linux 
> kernel thing - ignore all my previous code and write your own version. Oh 
> well, that I've come to expect; at least you get a copyright notice in the 
> bootup and somewhere in the comments give me credit for proving it's 
> possible. Let's give some other credit here too. William Lee Irwin provided 
> the major architecture behind plugsched at my request and I simply finished 
> the work and got it working. He is also responsible for many IRC discussions 
> I've had about cpu scheduling fairness, designs, programming history and code 
> help. Even though he did not contribute code directly to SD, his comments 
> have been invaluable.

Hello folks,

I think the main failure I see here is that Con wasn't included in this design
or privately in review process. There could have been better co-ownership of the
code. This could also have been done openly on lkml (since this is kind of what
this medium is about to significant degree) so that consensus can happen (Con
can be reasoned with). It would have achieved the same thing but probably more
smoothly if folks just listened, considered an idea and then, in this case,
created something that would allow for experimentation from outsiders in a
fluid fashion.

If these issues aren't fixed, you're going to stuck with the same kind of 
creeping
elitism that has gradually killed the FreeBSD project and other BSDs. I can't
comment on the code implementation. I'm focus on other things now that I'm at
NetApp and I can't help out as much as I could. Being former BSDi, I had a first
hand account of these issues as they played out.

A development process like this is likely to exclude smart people from wanting
to contribute to Linux and folks should be conscious about this issues. It's
basically a lot of code and concept that at least two individuals have worked
on (wli and con) only to have it be rejected and then sudden replaced by
code from a community gatekeeper. In this case, this results in both Con and
Bill Irwin being woefully under utilized.

If I were one of these people. I'd be mighty pissed.

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-14 Thread Davide Libenzi

On Sat, 14 Apr 2007, William Lee Irwin III wrote:

> The two basic attacks on such large priority spaces are the near future
> vs.  far future subdivisions and subdividing the priority space into
> (most often regular) intervals. Subdividing the priority space into
> intervals is the most obvious; you simply use some O(lg(n)) priority
> queue as the bucket discipline in the "time ring," queue by the upper
> bits of the queue priority in the time ring, and by the lower bits in
> the O(lg(n)) bucket discipline.

Sure. If you really need sub-millisecond precision, you can replace the 
bucket's list_head with an rb_root. It may be not necessary though for a 
cpu scheduler (still, didn't read Ingo's code yet).


- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ZFS with Linux: An Open Plea

2007-04-14 Thread Kasper Sandberg

On Fri, 2007-04-13 at 19:18 -0400, David R. Litwin wrote:
> Before I go on, let me appologise. I don't really know what I hope to  
> accomplish, beyond trying to garner thoughts (and support?) for the topic.
> 
> Essentially: I want to use Linux and ZFS. I don't particularly care about  
> licences or any of the rest of that nonsense. The code is there; it merely  
> needs to be made to work with Linux. Done and done -- provided I can find  
> some one to do this for me (I'd do it myself, but I haven't the foggiest  
> notion how to go about such a feat).
> 
> By the way, forget about this FUSE business. I don't know why they're  
> bothering: It's not real, it's slow and, in general, silly.
This seems to me to be a rather uninformed, arrogant, and quite stupid
comment.

> 
> What are the thoughts of the Linux community?
> 
> I appologise right now for my intrusion. I am a Linux-nobody; I freely  
> admit it. I haven't even subscribed to this list (so do CC me) because I  
> don't want to be over-whelmed with the list's glorious posts. But, part of  
> Linux is it's being a community. If a member of this community (that is, a  
> user of Linux) can't ask the others their
> thoughts and opinions, then the community has failed in a large respect.  
> Take this letter as you will.
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-14 Thread William Lee Irwin III

On Fri, 13 Apr 2007, William Lee Irwin III wrote:
>> A binomial heap would likely serve your purposes better than rbtrees.
[...]

On Sat, Apr 14, 2007 at 03:38:04PM -0700, Davide Libenzi wrote:
> Haven't looked at the scheduler code yet, but for a similar problem I use 
> a time ring. The ring has Ns (2 power is better) slots (where tasks are 
> queued - in my case they were som sort of timers), and it has a current 
> base index (Ib), a current base time (Tb) and a time granularity (Tg). It 
> also has a bitmap with bits telling you which slots contains queued tasks. 
> An item (task) that has to be scheduled at time T, will be queued in the slot:
> S = Ib + min((T - Tb) / Tg, Ns - 1);
> Items with T longer than Ns*Tg will be scheduled in the relative last slot 
> (chosing a proper Ns and Tg can minimize this).
> Queueing is O(1) and de-queueing is O(Ns). You can play with Ns and Tg to 
> suite to your needs.

I used a similar sort of queue in the virtual deadline scheduler I
wrote in 2003 or thereabouts. CFS uses queue priorities with too high
a precision to map directly to this (queue priorities are marked as
"key" in the cfs code and should not be confused with task priorities).
The elder virtual deadline scheduler used millisecond resolution and a
rather different calculation for its equivalent of ->key, which
explains how it coped with a limited priority space.

The two basic attacks on such large priority spaces are the near future
vs.  far future subdivisions and subdividing the priority space into
(most often regular) intervals. Subdividing the priority space into
intervals is the most obvious; you simply use some O(lg(n)) priority
queue as the bucket discipline in the "time ring," queue by the upper
bits of the queue priority in the time ring, and by the lower bits in
the O(lg(n)) bucket discipline. The near future vs. far future
subdivision is maintaining the first N tasks in a low-constant-overhead
structure like a sorted list and the remainder in some other sort of
queue structure intended to handle large numbers of elements gracefully.
The distribution of queue priorities strongly influences which of the
methods is most potent, though it should be clear the methods can be
used in combination.

-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [NFS] Merge plans for RPC/RDMA? (Was: Re: [PATCH 000 of 14] knfsd: Preparation for IPv6 support in NFS server.)

2007-04-14 Thread Mike Snitzer

On 4/13/07, Trond Myklebust <[EMAIL PROTECTED]> wrote:

On Fri, 2007-04-13 at 16:47 -0400, Mike Snitzer wrote:
> I must be missing something because I don't see _any_ trace of the
> core RPC over RDMA support (xprtrdma et al), your RPC Transport
> Switch, or any of the other supporting changes in mainline.  Could
> you, or others, please clarify the plan for merging RPC/RDMA?

That would be a question for the actual RPC/RDMA developers (i.e. James
Lentini and Tom Talpey). They haven't submitted any code for review yet.

The reason I asked is there seems to be a catch-22 going on here if
(as Chuck indicated) the NetApp engineers are waiting for the
remaining rpc transport switch patches to be merged.  My naive
understanding is that those remaining transport switch patches aren't
_really_ needed without the RPC/RDMA patches.

Essentially I'm echoing Or Gerlitz's post to openib-general back in December:
http://lists.openfabrics.org/pipermail/general/2006-December/029721.html

If so, it begs the question: why the hold up from the NetApp
engineers?  In the hopes of getting some insight I've cc'd James
Lentini and Tom Talpey.

I'm assuming that all of you obviously want your code merged mainline.
The thing that is puzzling is the non-traditional release management
of this RPC/RDMA and NFS/RDMA code.  Aside from the periodic
announcements of the tarball updates to the sf.net nfs-rdma project;
I've not found any posting/discussion of associated patches to the
various mailing lists (nfs, nfsv4, ofa's general, lkml, etc).

I'm interested in understanding why the reluctance to push for a merge
now (let alone some months ago) given the various successes that have
been seen with the NFS/RDMA effort (Sandia, SC '06, as Chuck noted:
good performance at various test sites).  If the code is holding up
well why the delay in review and merging?  And rather than wait for
the remaining transport switch patches; why not look to merge all of
NFS/RDMA (client) work at the same time?

regards,
Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] the scheduled -EINVAL for invalid timevals in setitimer

2007-04-14 Thread Thomas Gleixner

On Sat, 2007-04-14 at 17:03 +0200, Adrian Bunk wrote:
> As scheduled, do_setitimer() now returns -EINVAL for invalid timeval.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

Acked-by: Thomas Gleixner <[EMAIL PROTECTED]>



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-14 Thread Con Kolivas

On Saturday 14 April 2007 06:21, Ingo Molnar wrote:
> [announce] [patch] Modular Scheduler Core and Completely Fair Scheduler
> [CFS]
>
> i'm pleased to announce the first release of the "Modular Scheduler Core
> and Completely Fair Scheduler [CFS]" patchset:
>
>http://redhat.com/~mingo/cfs-scheduler/sched-modular+cfs.patch
>
> This project is a complete rewrite of the Linux task scheduler. My goal
> is to address various feature requests and to fix deficiencies in the
> vanilla scheduler that were suggested/found in the past few years, both
> for desktop scheduling and for server scheduling workloads.

The casual observer will be completely confused by what on earth has happened 
here so let me try to demystify things for them.

1. I tried in vain some time ago to push a working extensable pluggable cpu 
scheduler framework (based on wli's work) for the linux kernel. It was 
perma-vetoed by Linus and Ingo (and Nick also said he didn't like it) as 
being absolutely the wrong approach and that we should never do that. Oddly 
enough the linux-kernel-mailing list was -dead- at the time and the 
discussion did not make it to the mailing list. Every time I've tried to 
forward it to the mailing list the spam filter decided to drop it so most 
people have not even seen this original veto-forever discussion.

2. Since then I've been thinking/working on a cpu scheduler design that takes 
away all the guesswork out of scheduling and gives very predictable, as fair 
as possible, cpu distribution and latency while preserving as solid 
interactivity as possible within those confines. For weeks now, Ingo has said 
that the interactivity regressions were showstoppers and we should address 
them, never mind the fact that the so-called regressions were purely "it 
slows down linearly with load" which to me is perfectly desirable behaviour. 
While this was not perma-vetoed, I predicted pretty accurately your intent 
was to veto it based on this.

People kept claiming scheduling problems were few and far between but what was 
really happening is users were terrified of lkml and instead used 1. windows 
and 2. 2.4 kernels. The problems were there.

So where are we now? Here is where your latest patch comes in.

As a solution to the many scheduling problems we finally all agree exist, you 
propose a patch that adds 1. a limited pluggable framework and 2. a fairness 
based cpu scheduler policy... o_O

So I should be happy at last now that the things I was promoting you are also 
promoting, right? Well I'll fill in the rest of the gaps and let other people 
decide how I should feel.

> as usual, any sort of feedback, bugreports, fixes and suggestions are
> more than welcome,

In the last 4 weeks I've spent time lying in bed drugged to the eyeballs and 
having trips in and out of hospitals for my condition. I appreciate greatly 
the sympathy and patience from people in this regard. However at one stage I 
virtually begged for support with my attempts and help with the code. Dmitry 
Adamushko is the only person who actually helped me with the code in the 
interim, while others poked sticks at it. Sure the sticks helped at times but 
the sticks always seemed to have their ends kerosene doused and flaming for 
reasons I still don't get. No other help was forthcoming.

Now that you're agreeing my direction was correct you've done the usual Linux 
kernel thing - ignore all my previous code and write your own version. Oh 
well, that I've come to expect; at least you get a copyright notice in the 
bootup and somewhere in the comments give me credit for proving it's 
possible. Let's give some other credit here too. William Lee Irwin provided 
the major architecture behind plugsched at my request and I simply finished 
the work and got it working. He is also responsible for many IRC discussions 
I've had about cpu scheduling fairness, designs, programming history and code 
help. Even though he did not contribute code directly to SD, his comments 
have been invaluable.

So let's look at the code.

kernel/sched.c
kernel/sched_fair.c
kernel/sched_rt.c

It turns out this is not a pluggable cpu scheduler framework at all, and I 
guess you didn't really promote it as such. It's a "modular scheduler core". 
Which means you moved code from sched.c into sched_fair.c and sched_rt.c. 
This abstracts out each _scheduling policy's_ functions into struct 
sched_class and allows each scheduling policy's functions to be in a separate 
file etc.

Ok so what it means is that instead of whole cpu schedulers being able to be 
plugged into this framework we can plug in only cpu scheduling policies 
hrm... So let's look on

-#define SCHED_NORMAL   0

Ok once upon a time we rename SCHED_OTHER which every other unix calls the 
standard policy 99.9% of applications used into a more meaningful name, 
SCHED_NORMAL. That's fine since all it did was change the description 
internally for those reading the code. Let's see what you've done now:

Re: 2.6.21-rc5 possible regression: KDE processes die silently [fixed]

2007-04-14 Thread Sid Boyce


Am 03.04.2007 00:50 schrieb Adrian Bunk:
>>/ We also have one bug kwin ran into that got fixed after -rc5:/
>>/ /
>>/ Subject : kwin dies silently/
>>/ References : http://lkml.org/lkml/2007/2/28/112/
>>/ Submitter : Sid Boyce <[EMAIL PROTECTED]>/
>>/ Boris Mogwitz <[EMAIL PROTECTED]>/
>>/ Michael Wu <[EMAIL PROTECTED]>/
>>/ Caused-By : Eric W. Biederman <[EMAIL PROTECTED]>/
>>/ commit 0475ac0845f9295bc5f69af45f58dff2c104c8d1/
>>/ Fixed-By : Eric W. Biederman <[EMAIL PROTECTED]>/
>>/ Commit : 14e9d5730adfca26452b3a2838a80af6950556f5/
>>/ Status : fixed in -rc6/
>>
> The machine has been running -rc5-git12, then -rc6 for a total
> of four days now, and the problem hasn't reoccurred. Looks like
> it was indeed the same bug.
>
> Thanks,
> Tilman

No further problems since the patch was applied. Currently running 
2.6.21-rc6-git7.

Regards
Sid.

--
Sid Boyce ... Hamradio License G3VBV, Licensed Private Pilot
Emeritus IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support Specialist, 
Cricket Coach
Microsoft Windows Free Zone - Linux used for all Computing Tasks


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: {Spam?} Re: {Spam?} [PATCH] NET: Remove obsolete traffic shaper code.

2007-04-14 Thread Ian McDonald


On 4/15/07, Robert P. J. Day <[EMAIL PROTECTED]> wrote:

in fact, according to this:

http://lkml.org/lkml/2006/1/13/139

that notice was put in the feature removal file well over a year ago,
during 2.6.15.  so that would seem to be more than adequate time for
everyone to prepare for it.  but it must have been deleted from that
file since then as well.


Yes and that was never merged and so was resent on January 19th, 2006:
http://www.nabble.com/-2.6-patch--schedule-SHAPER-for-removal-t949871.html

At that point people debated about it being too short notice and the
patch never went in.

I therefore think we can't just remove with NO notice.

Ian
--
Web: http://wand.net.nz/~iam4/
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-usb-devel] 2.6.21-rc6 USB Keyboard hangs (EHCI/UHCI/HID)

2007-04-14 Thread Alan Stern

On Sat, 14 Apr 2007, Matthias Andree wrote:

> I arrived at the computer today, to find khubd in D state again, but
> unfortunately, it does not show up in Alt-SysRq-T output. Do kernel
> threads show up there at all? 2.6.18.8-0.1 with SUSE patches on openSUSE
> 10.2.

As far as I know, all tasks including kernel threads should show up.  They 
certainly do on my machine.  However you're the second person to report 
that khubd doesn't appear after getting stuck in a D state.

Clearly something strange is going on.  I wish I knew what it was...

> I had stopped usbmon before however, so I don't know what the events
> have been leading to this lockup. Sorry. I'll leave usbmon running now.

Something else you can do is turn on CONFIG_USB_DEBUG and then check the
dmesg log for anything significant.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [lm-sensors] Hardware monitoring subsystem maintainer positionis open

2007-04-14 Thread Dmitry Torokhov

Hi,

On Thursday 12 April 2007 03:27, Hans de Goede wrote:
> Krzysztof Helt wrote:
> 
> >> * Must follow kernel coding style guidelines
> > 
> > Is there any tool to check this? If there is one, a basic
> > instruction how to use it would be great.
> > 
> 
> No tool.

Passing new drivres through scripts/Lindent and analyzing the
dirrerences can highlight differences between kernel style and
author's (note that I do not advocate mechanically passing
everything though Lindent).

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Problem with ufs nextstep in 2.6.18 (debian)

2007-04-14 Thread Dale Amon

I recently noticed that I can no longer read my 
images of NeXTstep floppies on certain machines.
All are running an up to date etch distribution
but the difference between where I can read or not
read seems to be the linux version. On a 2.6.18
machine:

# mount -t ufs -o ro,ufstype=nextstep,loop nextfloppy-fd0a.ufs /floppy
# ls /floppy
ls: reading directory /floppy: Input/output error

On a 2.6.13 machine it still works fine:

# mount -t ufs -o ro,ufstype=nextstep,loop nextfloppy-fd0a.ufs /floppy
# ls /floppy/
private
# ls /floppy/private
Drivers

Have there been any recent changes that would cause a 
breakage in this area?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel BUG at net/core/skbuff.c in linux-2.6.21-rc6

2007-04-14 Thread Paul Mackerras

I wrote:

> So this doesn't change process_input_packet(), which treats the case
> where the first byte is 0xff (PPP_ALLSTATIONS) but the second byte is
> 0x03 (PPP_UI) as indicating a packet with a PPP protocol number of

I meant "the second byte is NOT 0x03", of course.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel BUG at net/core/skbuff.c in linux-2.6.21-rc6

2007-04-14 Thread Paul Mackerras

David Miller writes:

> Here is Patrick McHardy's patch:

So this doesn't change process_input_packet(), which treats the case
where the first byte is 0xff (PPP_ALLSTATIONS) but the second byte is
0x03 (PPP_UI) as indicating a packet with a PPP protocol number of
0xff.  Arguably that's wrong since PPP protocol 0xff is reserved, and
the RFC does envision the possibility of receiving frames where the
control field has values other than 0x03.

Therefore I think this patch is probably better.  Could people try it
out and let me know if it fixes the problem?

Paul.

diff --git a/drivers/net/ppp_async.c b/drivers/net/ppp_async.c
index 933e2f3..caabbc4 100644
--- a/drivers/net/ppp_async.c
+++ b/drivers/net/ppp_async.c
@@ -802,9 +802,9 @@ process_input_packet(struct asyncppp *ap)
 
/* check for address/control and protocol compression */
p = skb->data;
-   if (p[0] == PPP_ALLSTATIONS && p[1] == PPP_UI) {
+   if (p[0] == PPP_ALLSTATIONS) {
/* chop off address/control */
-   if (skb->len < 3)
+   if (p[1] != PPP_UI || skb->len < 3)
goto err;
p = skb_pull(skb, 2);
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/7] [RFC] Battery monitoring class

2007-04-14 Thread Anton Vorontsov

On Fri, Apr 13, 2007 at 05:49:39PM +0400, Anton Vorontsov wrote:
> I'll convert mXh to uXh a bit later, if there will no further objections
> against uXh. Also I'd like to hear if there any objections on
> mA/mV -> uA/uV conversion. I think we'd better keep all units at the
> same order/precision.

Okay, would it make sense to use "long" instead of "int" after "milli" to
"micro" conversion? On 32 bit machines int gives +-2147483648 limit. So
2147 volts/amperes/...

Though 2147 amperes is unrealistic for batteries, but if used in
calculations it could be dangerous.

For example:
di->life_sec = -((di->accum_current_uAh - di->empty_uAh) *
 3600) / di->current_uA;

It can be also solved (and I voting for it) by typecasting to long
in the driver itself.

Would it also make sense to use int64_t instead of long? And how should
it passed to printk in portable way? I guess printk (vsprintf) does not
support PRIx notation as defined in /usr/include/inttypes.h ?

-- 
Anton Vorontsov
email: [EMAIL PROTECTED]
backup email: [EMAIL PROTECTED]
irc://irc.freenode.org/bd2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] cxacru: ADSL state management

2007-04-14 Thread Simon Arlott

The device has commands to start/stop the ADSL function, so this adds 
a sysfs attribute to allow it to be started/stopped/restarted. It also 
stops polling the device for status when the ADSL function is disabled.


There are no problems with sending multiple start or stop commands, 
even with a fast loop of them the device still works. There is no 
need to protect the restart process from further user actions while 
it's waiting for 1.5s.


Signed-off-by: Simon Arlott <[EMAIL PROTECTED]>
Cc: Greg Kroah-Hartman <[EMAIL PROTECTED]>
Cc: Duncan Sands <[EMAIL PROTECTED]>
---
This patch requires usb-cxacru-export-detailed-device-info-through-sysfs 
from gregkh-2.6.


drivers/usb/atm/cxacru.c |  215 --
1 files changed, 206 insertions(+), 9 deletions(-)

diff --git a/drivers/usb/atm/cxacru.c b/drivers/usb/atm/cxacru.c
index c8b69bf..a89c484 100644
--- a/drivers/usb/atm/cxacru.c
+++ b/drivers/usb/atm/cxacru.c
@@ -4,6 +4,7 @@
 *
 *  Copyright (C) 2004 David Woodhouse, Duncan Sands, Roman Kagan
 *  Copyright (C) 2005 Duncan Sands, Roman Kagan (rkagan % mail ! ru)
+ *  Copyright (C) 2007 Simon Arlott
 *
 *  This program is free software; you can redistribute it and/or modify it
 *  under the terms of the GNU General Public License as published by the Free
@@ -145,6 +146,13 @@ enum cxacru_info_idx {
/* dunno what the missing two mean */
CXINF_MAX = 0x1c,
};
+ 
+enum cxacru_poll_state {

+   CXPOLL_STOPPING,
+   CXPOLL_STOPPED,
+   CXPOLL_POLLING,
+   CXPOLL_SHUTDOWN
+};

struct cxacru_modem_type {
u32 pll_f_clk;
@@ -158,8 +166,11 @@ struct cxacru_data {
const struct cxacru_modem_type *modem_type;

int line_status;
+   int adsl_status;
struct delayed_work poll_work;
u32 card_info[CXINF_MAX];
+   struct mutex poll_state_serialize;
+   int poll_state;

/* contol handles */
struct mutex cm_serialize;
@@ -171,10 +182,18 @@ struct cxacru_data {
struct completion snd_done;
};

+static int cxacru_cm(struct cxacru_data *instance, enum cxacru_cm_request cm,
+   u8 *wdata, int wsize, u8 *rdata, int rsize);
+static void cxacru_poll_status(struct work_struct *work);
+
/* Card info exported through sysfs */
#define CXACRU__ATTR_INIT(_name) \
static DEVICE_ATTR(_name, S_IRUGO, cxacru_sysfs_show_##_name, NULL)

+#define CXACRU_CMD_INIT(_name) \
+static DEVICE_ATTR(_name, S_IWUSR | S_IRUGO, \
+   cxacru_sysfs_show_##_name, cxacru_sysfs_store_##_name)
+
#define CXACRU_ATTR_INIT(_value, _type, _name) \
static ssize_t cxacru_sysfs_show_##_name(struct device *dev, \
struct device_attribute *attr, char *buf) \
@@ -187,9 +206,11 @@ static ssize_t cxacru_sysfs_show_##_name(struct device 
*dev, \
CXACRU__ATTR_INIT(_name)

#define CXACRU_ATTR_CREATE(_v, _t, _name) CXACRU_DEVICE_CREATE_FILE(_name)
+#define CXACRU_CMD_CREATE(_name)  CXACRU_DEVICE_CREATE_FILE(_name)
#define CXACRU__ATTR_CREATE(_name)CXACRU_DEVICE_CREATE_FILE(_name)

#define CXACRU_ATTR_REMOVE(_v, _t, _name) CXACRU_DEVICE_REMOVE_FILE(_name)
+#define CXACRU_CMD_REMOVE(_name)  CXACRU_DEVICE_REMOVE_FILE(_name)
#define CXACRU__ATTR_REMOVE(_name)CXACRU_DEVICE_REMOVE_FILE(_name)

static ssize_t cxacru_sysfs_showattr_u32(u32 value, char *buf)
@@ -278,6 +299,105 @@ static ssize_t cxacru_sysfs_show_mac_address(struct 
device *dev,
atm_dev->esi[3], atm_dev->esi[4], atm_dev->esi[5]);
}

+static ssize_t cxacru_sysfs_show_adsl_state(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct usb_interface *intf = to_usb_interface(dev);
+   struct usbatm_data *usbatm_instance = usb_get_intfdata(intf);
+   struct cxacru_data *instance = usbatm_instance->driver_data;
+   u32 value = instance->card_info[CXINF_LINE_STARTABLE];
+
+   switch (value) {
+   case 0: return snprintf(buf, PAGE_SIZE, "running\n");
+   case 1: return snprintf(buf, PAGE_SIZE, "stopped\n");
+   default: return snprintf(buf, PAGE_SIZE, "unknown (%u)\n", value);
+   }
+}
+
+static ssize_t cxacru_sysfs_store_adsl_state(struct device *dev,
+   struct device_attribute *attr, const char *buf, size_t count)
+{
+   struct usb_interface *intf = to_usb_interface(dev);
+   struct usbatm_data *usbatm_instance = usb_get_intfdata(intf);
+   struct cxacru_data *instance = usbatm_instance->driver_data;
+   int ret = 0;
+   int poll = -1;
+
+   if (!capable(CAP_NET_ADMIN))
+   return -EACCES;
+
+   if (!strcmp(buf, "stop") || !strcmp(buf, "restart")) {
+   ret = cxacru_cm(instance, CM_REQUEST_CHIP_ADSL_LINE_STOP, NULL, 
0, NULL, 0);
+   if (ret < 0) {
+   atm_err(usbatm_instance, "change adsl state:"
+   " CHIP_ADSL_LINE_STOP returned %d\n", ret);
+
+   ret = -EIO;
+   } else {
+   ret =

Re: GIT and the current -stable

2007-04-14 Thread Julian Phillips


On Sun, 15 Apr 2007, Rene Herman wrote:



"v2.6.20.7" seems to be the only tag from the stable branches that's present
 in this tree?

[EMAIL PROTECTED]:[...]$ git tag -l | grep "v2\.6\.[[:digit:]]\{1,2\}\."
v2.6.20.7


Obviously I don't know how Chris created his conglomerated repo, but I 
just made one of my own, and it has all the tags I would expect in it ... 
so it's not an inherent git problem (or not in 1.5.1.1 anyway).


I guess that Chris created his in such a way that the automated tag 
following code didn't trigger? (Or maybe used a really old git?)


(Mine's at http://git.q42.co.uk/w/stable.git if anyone is interested 
enough to want to look at it ...)


It only took me 4 commands to create too (ok, so three of them were bash 
for loops ... and I did do a little bit more to tidy up), I do enjoy using 
flexible tools :D.


--
Julian

 ---
Go slowly to the entertainments of thy friends, but quickly to their
misfortunes.
-- Chilo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] usbatm: Detect usb device shutdown and ignore failed urbs.

2007-04-14 Thread Simon Arlott


Detect usb device shutdown and ignore failed urbs.
This happens when the driver is unloaded or the device is unplugged.

Signed-off-by: Simon Arlott <[EMAIL PROTECTED]>
Cc: Duncan Sands <[EMAIL PROTECTED]>
---
I'm not sure what other urb statuses should be ignored,
and the warning message doesn't need to be shown when
the module is unloaded or the device is removed.

drivers/usb/atm/usbatm.c |3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/atm/usbatm.c b/drivers/usb/atm/usbatm.c
index d0f1976..6b31175 100644
--- a/drivers/usb/atm/usbatm.c
+++ b/drivers/usb/atm/usbatm.c
@@ -274,6 +274,9 @@ static void usbatm_complete(struct urb *urb)
(!(channel->usbatm->flags & UDSL_IGNORE_EILSEQ) ||
 urb->status != -EILSEQ ))
{
+   if (urb->status == -ESHUTDOWN)
+   return;
+
if (printk_ratelimit())
atm_warn(channel->usbatm, "%s: urb 0x%p failed (%d)!\n",
__func__, urb, urb->status);
--
1.5.0.1

--
Simon Arlott
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

{Spam?} Re: {Spam?} [PATCH] NET: Remove obsolete traffic shaper code.

2007-04-14 Thread Robert P. J. Day

(i'm betting that the mail server i use back in canada is going to tag
this yet again with "{Spam?}" since i'm in california at the moment
and i'll just bet it's freaking out seeing stuff coming from a totally
unknown IP address.  i've already sent an email to the admins about
this.  sorry.)

On Sun, 15 Apr 2007, Ian McDonald wrote:

> On 4/15/07, Robert P. J. Day <[EMAIL PROTECTED]> wrote:
> >
> > Remove the obsolete code for the traffic shaper.
> >
> > Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]>
> >
> Apart from the merits of removing this which I can't comment on, I
> thought the usual procedure was to place a removal in
> Documentation/feature-removal-schedule.txt to notify people of what
> is going to be removed. Then wait the period you determine there and
> then remove.

in fact, according to this:

http://lkml.org/lkml/2006/1/13/139

that notice was put in the feature removal file well over a year ago,
during 2.6.15.  so that would seem to be more than adequate time for
everyone to prepare for it.  but it must have been deleted from that
file since then as well.

i probably should have mentioned that in my initial posting.

rday
-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: {Spam?} Re: {Spam?} [PATCH] NET: Remove obsolete traffic shaper code.

2007-04-14 Thread Rene Herman


On 04/15/2007 01:38 AM, Robert P. J. Day wrote:


Why are all your messages getting a "{Spam?}" subject prefix?


i have no idea, that's a recent development.  is that happening with
anyone else?


Not that I've seen. Your last message/thread were the others:

http://lkml.org/lkml/2007/4/14/89

Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: {Spam?} [PATCH] NET: Remove obsolete traffic shaper code.

2007-04-14 Thread Ian McDonald


On 4/15/07, Robert P. J. Day <[EMAIL PROTECTED]> wrote:


Remove the obsolete code for the traffic shaper.

Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]>


Apart from the merits of removing this which I can't comment on, I
thought the usual procedure was to place a removal in
Documentation/feature-removal-schedule.txt to notify people of what is
going to be removed. Then wait the period you determine there and then
remove.

Ian
--
Web: http://wand.net.nz/~iam4/
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

{Spam?} Re: {Spam?} [PATCH] NET: Remove obsolete traffic shaper code.

2007-04-14 Thread Robert P. J. Day

On Sun, 15 Apr 2007, Rene Herman wrote:

> On 04/15/2007 01:30 AM, Robert P. J. Day wrote:
>
> > Remove the obsolete code for the traffic shaper.
>
> Why are all your messages getting a "{Spam?}" subject prefix?

i have no idea, that's a recent development.  is that happening with
anyone else?

rday
-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: {Spam?} [PATCH] NET: Remove obsolete traffic shaper code.

2007-04-14 Thread Rene Herman


On 04/15/2007 01:30 AM, Robert P. J. Day wrote:


Remove the obsolete code for the traffic shaper.


Why are all your messages getting a "{Spam?}" subject prefix?

Rene.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: intermittant petabyte usage reported with broadcom nic

2007-04-14 Thread Michael Chan

On Mon, 2007-04-02 at 17:41 +1000, CaT wrote:
> On Mon, Apr 02, 2007 at 12:13:00AM -0700, Andrew Morton wrote:
> > On Mon, 2 Apr 2007 11:43:19 +1000 CaT <[EMAIL PROTECTED]> wrote:
> > 
> > > I take minute by minute snapshots of network traffic by sampling
> > > /proc/net/dev and most of the time everything works fine. Occasionally
> > > though I get petabyte byte traffic and corresponding packet traffic.
> > 
> > How frequently?
> 
> I can count about 6 over the past month.
> 
I did a quick test on a 64-bit kernel and did not see any problem with
the counters.  I'll ask the lab to set up a longer term test and monitor
the counters for bogus values.

I also like Andi's idea of using change_page_attr() to isolate the
problem.  I'll try to send you a debug patch in the next few days to try
that out.  Thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

{Spam?} [PATCH] NET: Remove obsolete traffic shaper code.

2007-04-14 Thread Robert P. J. Day


Remove the obsolete code for the traffic shaper.

Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]>

---

nothing seems to be using this, it's labelled "OBSOLETE" in the
Kconfig file, and there is not a single test for CONFIG_SHAPER
anywhere in the tree.  time to die.


 Documentation/networking/shaper.txt |   48 -
 arch/um/config.release  |1
 drivers/net/Kconfig |   17
 drivers/net/Makefile|1
 drivers/net/shaper.c|  651 --
 5 files changed, 718 deletions(-)

diff --git a/Documentation/networking/shaper.txt 
b/Documentation/networking/shaper.txt
deleted file mode 100644
index 6c4ebb6..000
--- a/Documentation/networking/shaper.txt
+++ /dev/null
@@ -1,48 +0,0 @@
-Traffic Shaper For Linux
-
-This is the current BETA release of the traffic shaper for Linux. It works
-within the following limits:
-
-o  Minimum shaping speed is currently about 9600 baud (it can only
-shape down to 1 byte per clock tick)
-
-o  Maximum is about 256K, it will go above this but get a bit blocky.
-
-o  If you ifconfig the master device that a shaper is attached to down
-then your machine will follow.
-
-o  The shaper must be a module.
-
-
-Setup:
-
-   A shaper device is configured using the shapeconfig program.
-Typically you will do something like this
-
-shapecfg attach shaper0 eth1
-shapecfg speed shaper0 64000
-ifconfig shaper0 myhost netmask 255.255.255.240 broadcast 1.2.3.4.255 up
-route add -net some.network netmask a.b.c.d dev shaper0
-
-The shaper should have the same IP address as the device it is attached to
-for normal use.
-
-Gotchas:
-
-   The shaper shapes transmitted traffic. It's rather impossible to
-shape received traffic except at the end (or a router) transmitting it.
-
-   Gated/routed/rwhod/mrouted all see the shaper as an additional device
-and will treat it as such unless patched. Note that for mrouted you can run
-mrouted tunnels via a traffic shaper to control bandwidth usage.
-
-   The shaper is device/route based. This makes it very easy to use
-with any setup BUT less flexible. You may need to use iproute2 to set up
-multiple route tables to get the flexibility.
-
-   There is no "borrowing" or "sharing" scheme. This is a simple
-traffic limiter. We implement Van Jacobson and Sally Floyd's CBQ
-architecture into Linux 2.2. This is the preferred solution. Shaper is
-for simple or back compatible setups.
-
-Alan
diff --git a/arch/um/config.release b/arch/um/config.release
index fc68bcb..4495b90 100644
--- a/arch/um/config.release
+++ b/arch/um/config.release
@@ -174,7 +174,6 @@ CONFIG_SLIP_SMART=y
 # CONFIG_TR is not set
 # CONFIG_NET_FC is not set
 # CONFIG_RCPCI is not set
-CONFIG_SHAPER=m

 #
 # Wan interfaces
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index c3f9f59..a21f004 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2899,23 +2899,6 @@ config NET_FC
  adaptor below. You also should have said Y to "SCSI support" and
  "SCSI generic support".

-config SHAPER
-   tristate "Traffic Shaper (OBSOLETE)"
-   depends on EXPERIMENTAL
-   ---help---
- The traffic shaper is a virtual network device that allows you to
- limit the rate of outgoing data flow over some other network device.
- The traffic that you want to slow down can then be routed through
- these virtual devices. See
-  for more information.
-
- An alternative to this traffic shaper are traffic schedulers which
- you'll get if you say Y to "QoS and/or fair queuing" in
- "Networking options".
-
- To compile this driver as a module, choose M here: the module
- will be called shaper.  If unsure, say N.
-
 config NETCONSOLE
tristate "Network console logging support (EXPERIMENTAL)"
depends on EXPERIMENTAL
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 33af833..30721ea 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -85,7 +85,6 @@ obj-$(CONFIG_NET_SB1000) += sb1000.o
 obj-$(CONFIG_MAC8390) += mac8390.o
 obj-$(CONFIG_APNE) += apne.o 8390.o
 obj-$(CONFIG_PCMCIA_PCNET) += 8390.o
-obj-$(CONFIG_SHAPER) += shaper.o
 obj-$(CONFIG_HP100) += hp100.o
 obj-$(CONFIG_SMC9194) += smc9194.o
 obj-$(CONFIG_FEC) += fec.o
diff --git a/drivers/net/shaper.c b/drivers/net/shaper.c
deleted file mode 100644
index e886e8d..000
--- a/drivers/net/shaper.c
+++ /dev/null
@@ -1,651 +0,0 @@
-/*
- * Simple traffic shaper for Linux NET3.
- *
- * (c) Copyright 1996 Alan Cox <[EMAIL PROTECTED]>, All Rights Reserved.
- * http://www.redhat.com
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- *

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-14 Thread Davide Libenzi

On Sat, 14 Apr 2007, Davide Libenzi wrote:

> Haven't looked at the scheduler code yet, but for a similar problem I use 
> a time ring. The ring has Ns (2 power is better) slots (where tasks are 
> queued - in my case they were som sort of timers), and it has a current 
> base index (Ib), a current base time (Tb) and a time granularity (Tg). It 
> also has a bitmap with bits telling you which slots contains queued tasks. 
> An item (task) that has to be scheduled at time T, will be queued in the slot:
> 
> S = Ib + min((T - Tb) / Tg, Ns - 1);

... mod Ns, of course ;)


- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

NFS: Fix an Oops in nfs_setattr()

2007-04-14 Thread Trond Myklebust

It looks like nfs_setattr() and nfs_rename() also need to test whether the
target is a regular file before calling nfs_wb_all()...

Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
---

 fs/nfs/dir.c   |3 ++-
 fs/nfs/inode.c |6 --
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 92d8ec8..cd34697 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1684,7 +1684,8 @@ go_ahead:
 * ... prune child dentries and writebacks if needed.
 */
if (atomic_read(_dentry->d_count) > 1) {
-   nfs_wb_all(old_inode);
+   if (S_ISREG(old_inode->i_mode))
+   nfs_wb_all(old_inode);
shrink_dcache_parent(old_dentry);
}
nfs_inode_return_delegation(old_inode);
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 93d046c..44aa9b7 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -341,8 +341,10 @@ nfs_setattr(struct dentry *dentry, struct iattr *attr)
lock_kernel();
nfs_begin_data_update(inode);
/* Write all dirty data */
-   filemap_write_and_wait(inode->i_mapping);
-   nfs_wb_all(inode);
+   if (S_ISREG(inode->i_mode)) {
+   filemap_write_and_wait(inode->i_mapping);
+   nfs_wb_all(inode);
+   }
/*
 * Return any delegations if we're going to change ACLs
 */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: GIT and the current -stable

2007-04-14 Thread Rene Herman


On 04/14/2007 10:54 AM, Rene Herman wrote:


On 04/14/2007 10:34 AM, Chris Wright wrote:


I've already put a tree like this up on kernel.org.  The master branch
is Linus' tree, and there's branches for each of the stable releases
called linux-2.6.[12-20].y (I didn't add 2.6.11.y).

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6-stable.git;a=summary 


I see, thank you; that sounds like a good "master" repo to clone then.


Okay, I just cloned this repo, like:

git clone -n \
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6-stable.git \
stable

How do I now checkout for example 2.6.20.6, or get a diff between 2.6.20.6 
and the (at the moment of writing latest -stable) 2.6.20.7?


[EMAIL PROTECTED]:~/src/linux$ cd stable
[EMAIL PROTECTED]:~/src/linux/stable$ git branch -a
* master
  origin/HEAD
  origin/linux-2.6
  origin/linux-2.6.12.y
  origin/linux-2.6.13.y
  origin/linux-2.6.14.y
  origin/linux-2.6.15.y
  origin/linux-2.6.16.y
  origin/linux-2.6.17.y
  origin/linux-2.6.18.y
  origin/linux-2.6.19.y
  origin/linux-2.6.20.y
  origin/master

and I can check them out like

[EMAIL PROTECTED]:~/src/linux/stable$ git checkout origin/linux-2.6.20.y

or in this case, but only this case, like:

[EMAIL PROTECTED]:~/src/linux/stable$ git checkout v2.6.20.7

"v2.6.20.7" seems to be the only tag from the stable branches that's present 
 in this tree?


[EMAIL PROTECTED]:[...]$ git tag -l | grep "v2\.6\.[[:digit:]]\{1,2\}\."
v2.6.20.7

Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

NFS: Ensure PG_writeback is cleared when writeback fails

2007-04-14 Thread Trond Myklebust

If the writebacks are cancelled via nfs_cancel_dirty_list, or due to the
memory allocation failing in nfs_flush_one/nfs_flush_multi, then we must
ensure that the PG_writeback flag is cleared.

Also ensure that we actually own the PG_writeback flag whenever we
schedule a new writeback by making nfs_set_page_writeback() return the
value of test_set_page_writeback().
The PG_writeback page flag ends up replacing the functionality of the
PG_FLUSHING nfs_page flag, so we rip that out too.

Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
Cc: Peter Zijlstra <[EMAIL PROTECTED]>
---

 fs/nfs/write.c   |   22 +++---
 include/linux/nfs_page.h |1 -
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 2867e6b..e5d7cac 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -218,9 +218,11 @@ int nfs_congestion_kb;
 #define NFS_CONGESTION_OFF_THRESH  \
(NFS_CONGESTION_ON_THRESH - (NFS_CONGESTION_ON_THRESH >> 2))
 
-static void nfs_set_page_writeback(struct page *page)
+static int nfs_set_page_writeback(struct page *page)
 {
-   if (!test_set_page_writeback(page)) {
+   int ret = test_set_page_writeback(page);
+
+   if (!ret) {
struct inode *inode = page->mapping->host;
struct nfs_server *nfss = NFS_SERVER(inode);
 
@@ -228,6 +230,7 @@ static void nfs_set_page_writeback(struct page *page)
NFS_CONGESTION_ON_THRESH)
set_bdi_congested(>backing_dev_info, WRITE);
}
+   return ret;
 }
 
 static void nfs_end_page_writeback(struct page *page)
@@ -277,10 +280,8 @@ static int nfs_page_mark_flush(struct page *page)
spin_lock(req_lock);
}
spin_unlock(req_lock);
-   if (test_and_set_bit(PG_FLUSHING, >wb_flags) == 0) {
+   if (nfs_set_page_writeback(page) == 0)
nfs_mark_request_dirty(req);
-   nfs_set_page_writeback(page);
-   }
ret = test_bit(PG_NEED_FLUSH, >wb_flags);
nfs_unlock_request(req);
return ret;
@@ -424,7 +425,6 @@ nfs_mark_request_dirty(struct nfs_page *req)
 static void
 nfs_redirty_request(struct nfs_page *req)
 {
-   clear_bit(PG_FLUSHING, >wb_flags);
__set_page_dirty_nobuffers(req->wb_page);
 }
 
@@ -434,7 +434,11 @@ nfs_redirty_request(struct nfs_page *req)
 static inline int
 nfs_dirty_request(struct nfs_page *req)
 {
-   return test_bit(PG_FLUSHING, >wb_flags) == 0;
+   struct page *page = req->wb_page;
+
+   if (page == NULL)
+   return 0;
+   return !PageWriteback(req->wb_page);
 }
 
 #if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4)
@@ -500,6 +504,7 @@ static void nfs_cancel_dirty_list(struct list_head *head)
while(!list_empty(head)) {
req = nfs_list_entry(head->next);
nfs_list_remove_request(req);
+   nfs_end_page_writeback(req->wb_page);
nfs_inode_remove_request(req);
nfs_clear_page_writeback(req);
}
@@ -890,6 +895,7 @@ out_bad:
list_del(>pages);
nfs_writedata_release(data);
}
+   nfs_end_page_writeback(req->wb_page);
nfs_redirty_request(req);
nfs_clear_page_writeback(req);
return -ENOMEM;
@@ -935,6 +941,7 @@ static int nfs_flush_one(struct inode *inode, struct 
list_head *head, int how)
while (!list_empty(head)) {
struct nfs_page *req = nfs_list_entry(head->next);
nfs_list_remove_request(req);
+   nfs_end_page_writeback(req->wb_page);
nfs_redirty_request(req);
nfs_clear_page_writeback(req);
}
@@ -970,6 +977,7 @@ out_err:
while (!list_empty(head)) {
req = nfs_list_entry(head->next);
nfs_list_remove_request(req);
+   nfs_end_page_writeback(req->wb_page);
nfs_redirty_request(req);
nfs_clear_page_writeback(req);
}
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 2e555d4..d111be6 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -31,7 +31,6 @@
 #define PG_NEED_COMMIT 1
 #define PG_NEED_RESCHED2
 #define PG_NEED_FLUSH  3
-#define PG_FLUSHING4
 
 struct nfs_inode;
 struct nfs_page {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

NFS: Fix two bugs in the O_DIRECT write code

2007-04-14 Thread Trond Myklebust

Do not flag an error if the COMMIT call fails and we decide to resend the
writes. Let the resend flag the error if it fails.

If a write has failed, then nfs_direct_write_result should not attempt to
send a commit. It should just exit asap and return the error to the user.

Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
Cc: Chuck Lever <[EMAIL PROTECTED]>
---

 fs/nfs/direct.c |   11 +++
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index b1c98ea..2877744 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -432,10 +432,10 @@ static void nfs_direct_commit_result(struct rpc_task 
*task, void *calldata)
if (NFS_PROTO(data->inode)->commit_done(task, data) != 0)
return;
if (unlikely(task->tk_status < 0)) {
-   dreq->error = task->tk_status;
+   dprintk("NFS: %5u commit failed with error %d.\n",
+   task->tk_pid, task->tk_status);
dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
-   }
-   if (memcmp(>verf, >verf, sizeof(data->verf))) {
+   } else if (memcmp(>verf, >verf, sizeof(data->verf))) {
dprintk("NFS: %5u commit verify failed\n", task->tk_pid);
dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
}
@@ -531,9 +531,12 @@ static void nfs_direct_write_result(struct rpc_task *task, 
void *calldata)
 
spin_lock(>lock);
 
+   if (unlikely(dreq->error != 0))
+   goto out_unlock;
if (unlikely(status < 0)) {
+   /* An error has occured, so we should not commit */
+   dreq->flags = 0;
dreq->error = status;
-   goto out_unlock;
}
 
dreq->count += data->res.count;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] [RFD] swsusp problem: Drivers allocate much memory during suspend (was: Re: 2.6.21-rc5: swsusp: Not enough free memory)

2007-04-14 Thread Rafael J. Wysocki

Hi,

On Saturday, 14 April 2007 00:35, Rafael J. Wysocki wrote:
> On Saturday, 14 April 2007 00:10, Pavel Machek wrote:
[--snip--] 
> > > IMO to really fix the problem, we should let the drivers that need much 
> > > memory
> > > for suspending allocate it _before_ the memory shrinker is called.  For 
> > > this
> > > purpose we can use notifiers that will be called before we start the 
> > > shrinking
> > > of memory.  Namely, if a driver needs to allocate substantial amount
> > > of memory
> > 
> > Yes please. Using that notifier without leaking the memory will be
> > "interesting" but if someone needs so much memory during suspend, let
> > them eat their own complexity.
> 
> Okay, I'm going to prepare a patch along these lines.

The appended patch shows how I think this might look like.

There are a couple of things I'm not sure about in it:
1) Names (if you have better ideas, please tell me)
2) I don't know if SUSPEND_THAW_PREPARE is necessary, at least for now I don't
see any obvious case in which it may be useful, but I've added it for symmetry
3) Perhaps we can use the second argument of raw_notifier_call_chain() to pass
the information of what kind of suspend is going to happen (eg. STD vs STR), in
which case we'll need a second argument for suspend_notifier_call_chain()

Greetings,
Rafael

---
 include/linux/notifier.h |6 +
 include/linux/suspend.h  |   27 ++---
 kernel/power/Makefile|2 -
 kernel/power/disk.c  |   21 
 kernel/power/main.c  |   13 
 kernel/power/notify.c|   49 +++
 kernel/power/power.h |3 ++
 kernel/power/user.c  |   19 ++
 8 files changed, 135 insertions(+), 5 deletions(-)

Index: linux-2.6.21-rc6/kernel/power/notify.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.21-rc6/kernel/power/notify.c  2007-04-14 23:35:35.0 
+0200
@@ -0,0 +1,49 @@
+/*
+ * linux/kernel/power/notify.c
+ *
+ * This file contains functions used for registering and calling suspend
+ * notifiers that can be used by subsystems for carrying out some special
+ * suspend-related operations.
+ *
+ * Copyright (C) 2007 Rafael J. Wysocki <[EMAIL PROTECTED]>
+ *
+ * This file is released under the GPLv2.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+static DEFINE_MUTEX(suspend_notifier_lock);
+
+static RAW_NOTIFIER_HEAD(suspend_chain);
+
+int register_suspend_notifier(struct notifier_block *nb)
+{
+   int ret;
+   mutex_lock(_notifier_lock);
+   ret = raw_notifier_chain_register(_chain, nb);
+   mutex_unlock(_notifier_lock);
+   return ret;
+}
+EXPORT_SYMBOL(register_suspend_notifier);
+
+void unregister_suspend_notifier(struct notifier_block *nb)
+{
+   mutex_lock(_notifier_lock);
+   raw_notifier_chain_unregister(_chain, nb);
+   mutex_unlock(_notifier_lock);
+}
+EXPORT_SYMBOL(unregister_suspend_notifier);
+
+int suspend_notifier_call_chain(unsigned long val)
+{
+   int error;
+
+   mutex_lock(_notifier_lock);
+   error = raw_notifier_call_chain(_chain, val, NULL);
+   mutex_unlock(_notifier_lock);
+   return error;
+}
Index: linux-2.6.21-rc6/include/linux/suspend.h
===
--- linux-2.6.21-rc6.orig/include/linux/suspend.h   2007-04-14 
22:04:58.0 +0200
+++ linux-2.6.21-rc6/include/linux/suspend.h2007-04-14 23:32:18.0 
+0200
@@ -24,15 +24,16 @@ struct pbe {
 extern void drain_local_pages(void);
 extern void mark_free_pages(struct zone *zone);
 
-#if defined(CONFIG_PM) && defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE)
+#ifdef CONFIG_PM
+#if defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE)
 extern int pm_prepare_console(void);
 extern void pm_restore_console(void);
 #else
 static inline int pm_prepare_console(void) { return 0; }
 static inline void pm_restore_console(void) {}
-#endif
+#endif /* defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE) */
 
-#if defined(CONFIG_PM) && defined(CONFIG_SOFTWARE_SUSPEND)
+#ifdef CONFIG_SOFTWARE_SUSPEND
 /* kernel/power/swsusp.c */
 extern int software_suspend(void);
 /* kernel/power/snapshot.c */
@@ -52,7 +53,7 @@ static inline void register_nosave_regio
 static inline int swsusp_page_is_forbidden(struct page *p) { return 0; }
 static inline void swsusp_set_page_free(struct page *p) {}
 static inline void swsusp_unset_page_free(struct page *p) {}
-#endif /* defined(CONFIG_PM) && defined(CONFIG_SOFTWARE_SUSPEND) */
+#endif /* CONFIG_SOFTWARE_SUSPEND */
 
 void save_processor_state(void);
 void restore_processor_state(void);
@@ -60,4 +61,22 @@ struct saved_context;
 void __save_processor_state(struct saved_context *ctxt);
 void __restore_processor_state(struct saved_context *ctxt);
 
+int register_suspend_notifier(struct notifier_block *nb);
+void unregister_suspend_notifier(struct

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-14 Thread Davide Libenzi

On Fri, 13 Apr 2007, William Lee Irwin III wrote:

> On Fri, Apr 13, 2007 at 10:21:00PM +0200, Ingo Molnar wrote:
> >The CFS patch uses a completely different approach and implementation
> >from RSDL/SD. My goal was to make CFS's interactivity quality exceed
> >that of RSDL/SD, which is a high standard to meet :-) Testing
> >feedback is welcome to decide this one way or another. [ and, in any
> >case, all of SD's logic could be added via a kernel/sched_sd.c module
> >as well, if Con is interested in such an approach. ]
> >CFS's design is quite radical: it does not use runqueues, it uses a
> >time-ordered rbtree to build a 'timeline' of future task execution,
> >and thus has no 'array switch' artifacts (by which both the vanilla
> >scheduler and RSDL/SD are affected).
> 
> A binomial heap would likely serve your purposes better than rbtrees.
> It's faster to have the next item to dequeue at the root of the tree
> structure rather than a leaf, for one. There are, of course, other
> priority queue structures (e.g. van Emde Boas) able to exploit the
> limited precision of the priority key for faster asymptotics, though
> actual performance is an open question.

Haven't looked at the scheduler code yet, but for a similar problem I use 
a time ring. The ring has Ns (2 power is better) slots (where tasks are 
queued - in my case they were som sort of timers), and it has a current 
base index (Ib), a current base time (Tb) and a time granularity (Tg). It 
also has a bitmap with bits telling you which slots contains queued tasks. 
An item (task) that has to be scheduled at time T, will be queued in the slot:

S = Ib + min((T - Tb) / Tg, Ns - 1);

Items with T longer than Ns*Tg will be scheduled in the relative last slot 
(chosing a proper Ns and Tg can minimize this).
Queueing is O(1) and de-queueing is O(Ns). You can play with Ns and Tg to 
suite to your needs.
This is a simple bench between time-ring (TR) and CFS queueing:

http://www.xmailserver.org/smart-queue.c

In my box (Dual Opteron 252):

[EMAIL PROTECTED]:~$ ./smart-queue -n 8
CFS = 142.21 cycles/loop
TR  = 72.33 cycles/loop
[EMAIL PROTECTED]:~$ ./smart-queue -n 16
CFS = 188.74 cycles/loop
TR  = 83.79 cycles/loop
[EMAIL PROTECTED]:~$ ./smart-queue -n 32
CFS = 221.36 cycles/loop
TR  = 75.93 cycles/loop
[EMAIL PROTECTED]:~$ ./smart-queue -n 64
CFS = 242.89 cycles/loop
TR  = 81.29 cycles/loop

- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/1] Char: mxser_new, fix recursive locking

2007-04-14 Thread Jiri Slaby


On 4/14/07, Jan Yenya Kasprzak <[EMAIL PROTECTED]> wrote:

Jiri Slaby wrote:
:  ioctl(fd, TIOCMIWAIT, TIOCM_CD);

[...]

Hmm, I have tried to run this, and got a machine lockup, and after
a minute or so the following has been printed to the console:


Hmm, the driver got shot full of holes, there's missing schedule() in
ioctl of both drivers. I'll post a patch in the morning or during the
day.

thanks for now,
--
http://www.fi.muni.cz/~xslaby/Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 10/28] i386: map enough initial memory to create lowmem mappings

2007-04-14 Thread H. Peter Anvin


Jeremy Fitzhardinge wrote:

-
+LOW_PAGES = 1<<(32-PAGE_SHIFT_asm)
+


Again, for debugging... it would be interesting to replace this with:

LOW_PAGES = (0x1-__PAGE_OFFSET) >> PAGE_SHIFT_asm

... to smoke out further problems; this will take the strict definition 
of "lowmem" (modulo the pci region, which someone unwisely made 
dynamically adjustable... which would have been fine if initramfs could 
live in highmem, which it can't yet.)


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.21-rc6

2007-04-14 Thread Rafael J. Wysocki

On Saturday, 14 April 2007 23:35, Tobias Diedrich wrote:
> Rafael J. Wysocki wrote:
> > On Saturday, 14 April 2007 21:56, Tobias Diedrich wrote:
> > > Rafael J. Wysocki wrote:
> > > > On Saturday, 14 April 2007 15:00, Adrian Bunk wrote:
> > > > > On Sat, Apr 14, 2007 at 02:31:54PM +0200, Tobias Diedrich wrote:
> > > > > > Tobias Diedrich wrote:
> > > > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 is first bad commit
> > > > > > > > commit ed746e3b18f4df18afa3763155972c5835f284c5
> > > > > > > > Author: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > > > > > > > Date:   Sat Feb 10 01:43:32 2007 -0800
> > > > > > > > 
> > > > > > > > [PATCH] swsusp: Change code ordering in disk.c
> > > > > > > > 
> > > > > > > > Change the ordering of code in kernel/power/disk.c so that 
> > > > > > > > device_suspend() is
> > > > > > > > called before disable_nonboot_cpus() and platform_finish() 
> > > > > > > > is called after
> > > > > > > > enable_nonboot_cpus() and before device_resume(), as 
> > > > > > > > indicated by the recent
> > > > > > > > discussion on Linux-PM (cf.
> > > > > > > > 
> > > > > > > > http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html).
> > > > > > > > 
> > > > > > > > The changes here only affect the built-in swsusp.
> > > > > > > > 
> > > > > > > > [EMAIL PROTECTED]: fix LED blinking during image load]
> > > > > > > > Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > > > > > > > Acked-by: Pavel Machek <[EMAIL PROTECTED]>
> > > > > > > > Cc: Greg KH <[EMAIL PROTECTED]>
> > > > > > > > Cc: Nigel Cunningham <[EMAIL PROTECTED]>
> > > > > > > > Cc: Patrick Mochel <[EMAIL PROTECTED]>
> > > > > > > > Cc: Alexey Starikovskiy <[EMAIL PROTECTED]>
> > > > > > > > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> > > > > > > > Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
> > > > > > > > 
> > > > > > > > :04 04 7eca5b3a8f9606bc4f2ff41192ec8c9d4ca90d18 
> > > > > > > > 8313b674e1d1bdf6849350af06d28a89b3bb3054 M  kernel
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Now, the remaining test is to try reverting this commit from 
> > > > > > > > -rc6. :)
> > > > > > > 
> > > > > > > Doesn't apply cleanly against -rc6, but fixes the problem when
> > > > > > > reverted from -rc1.
> > > > > > 
> > > > > > Now, this was already reported in
> > > > > > http://lkml.org/lkml/2007/3/16/126
> > > > > > and I even flagged that message in my local folder, but apparently 
> > > > > > forgot
> > > > > > to follow up on it... *sigh*
> > > > > 
> > > > > Unless I misunderstood something, all of the problems Maxim described 
> > > > > in 
> > > > > this email are fixed for him in -rc6.
> > > > > 
> > > > > But it's quite possible that you are running into a different issue 
> > > > > exposed by this commit.
> > > > 
> > > > Yes, it's likely.
> > > > 
> > > > Tobias, I'm unable to reproduce the problem with your .config, but my 
> > > > hardware
> > > > is certainly different.  Which suspend mode do you use?  If that's 
> > > > "platform",
> > > > can you try to use "shutdown" or "reboot" and see if that helps?
> > > 
> > > Sure.
> > > shutdown/reboot works fine, only platform is broken.
> > 
> > Thanks.
> > 
> > Now, I suspect the problem is somehow related to the hardware, so it would 
> > help
> > a lot if we could identify the piece of hardware (or driver) involved.
> > 
> > AFAICT, your system is a non-SMP one, so we can rule out
> > disable/enable_nonboot_cpus().  To confirm that the problem is related to
> > platform_finish(), can you please apply the appended debug patch and
> > see if the suspend in the 'platform' mode works with it?
> 
> Yes, it's a Asus M2N-SLI-Deluxe Mainboard with a Athlon64 3200+
> single core CPU.
> 
> > Also, would that be feasible for you to use 'shutdown' as a workaround in 
> > case
> > the source of the problem is difficult to find and/or fix?
> 
> I guess so, but the below patch fixes the problem. :)

Well, I thought it would, but it also would break some other people's systems.
That's the _real_ problem.  Let's see if we can learn more.

Can you please revert it for now, apply the appended one and try to
suspend/resume twice in the 'platform' mode (it may or may not work)?

Rafael

---
 kernel/power/disk.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

Index: linux-2.6.21-rc6/kernel/power/disk.c
===
--- linux-2.6.21-rc6.orig/kernel/power/disk.c
+++ linux-2.6.21-rc6/kernel/power/disk.c
@@ -267,12 +267,15 @@ static int software_resume(void)
error = swsusp_read();
if (error) {
swsusp_free();
-   platform_finish();
goto Thaw;
}
 
pr_debug("PM: Preparing devices for restore.\n");
 
+   error = platform_prepare();
+   if (error)
+   goto Thaw;
+
suspend_console();
error =

Question about documentation for do_wait syscall ( and flags) and clone...

2007-04-14 Thread yantux

Hello community.

I use 2.6.17 gentoo.

Take my please reference to documentation, that discribe __WALL, __WNOTHREAD 
and __WALL flags for do_wait syscall.

Can be CLONE_THREAD flag (in do_fork syscall) effect to do_wait syscall with 
__WNOTHREAD (or __WALL) flag?

Best regards,
yantux.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.21-rc6

2007-04-14 Thread Tobias Diedrich

Rafael J. Wysocki wrote:
> On Saturday, 14 April 2007 21:56, Tobias Diedrich wrote:
> > Rafael J. Wysocki wrote:
> > > On Saturday, 14 April 2007 15:00, Adrian Bunk wrote:
> > > > On Sat, Apr 14, 2007 at 02:31:54PM +0200, Tobias Diedrich wrote:
> > > > > Tobias Diedrich wrote:
> > > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 is first bad commit
> > > > > > > commit ed746e3b18f4df18afa3763155972c5835f284c5
> > > > > > > Author: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > > > > > > Date:   Sat Feb 10 01:43:32 2007 -0800
> > > > > > > 
> > > > > > > [PATCH] swsusp: Change code ordering in disk.c
> > > > > > > 
> > > > > > > Change the ordering of code in kernel/power/disk.c so that 
> > > > > > > device_suspend() is
> > > > > > > called before disable_nonboot_cpus() and platform_finish() is 
> > > > > > > called after
> > > > > > > enable_nonboot_cpus() and before device_resume(), as 
> > > > > > > indicated by the recent
> > > > > > > discussion on Linux-PM (cf.
> > > > > > > 
> > > > > > > http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html).
> > > > > > > 
> > > > > > > The changes here only affect the built-in swsusp.
> > > > > > > 
> > > > > > > [EMAIL PROTECTED]: fix LED blinking during image load]
> > > > > > > Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > > > > > > Acked-by: Pavel Machek <[EMAIL PROTECTED]>
> > > > > > > Cc: Greg KH <[EMAIL PROTECTED]>
> > > > > > > Cc: Nigel Cunningham <[EMAIL PROTECTED]>
> > > > > > > Cc: Patrick Mochel <[EMAIL PROTECTED]>
> > > > > > > Cc: Alexey Starikovskiy <[EMAIL PROTECTED]>
> > > > > > > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> > > > > > > Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
> > > > > > > 
> > > > > > > :04 04 7eca5b3a8f9606bc4f2ff41192ec8c9d4ca90d18 
> > > > > > > 8313b674e1d1bdf6849350af06d28a89b3bb3054 M  kernel
> > > > > > > 
> > > > > > > 
> > > > > > > Now, the remaining test is to try reverting this commit from 
> > > > > > > -rc6. :)
> > > > > > 
> > > > > > Doesn't apply cleanly against -rc6, but fixes the problem when
> > > > > > reverted from -rc1.
> > > > > 
> > > > > Now, this was already reported in
> > > > > http://lkml.org/lkml/2007/3/16/126
> > > > > and I even flagged that message in my local folder, but apparently 
> > > > > forgot
> > > > > to follow up on it... *sigh*
> > > > 
> > > > Unless I misunderstood something, all of the problems Maxim described 
> > > > in 
> > > > this email are fixed for him in -rc6.
> > > > 
> > > > But it's quite possible that you are running into a different issue 
> > > > exposed by this commit.
> > > 
> > > Yes, it's likely.
> > > 
> > > Tobias, I'm unable to reproduce the problem with your .config, but my 
> > > hardware
> > > is certainly different.  Which suspend mode do you use?  If that's 
> > > "platform",
> > > can you try to use "shutdown" or "reboot" and see if that helps?
> > 
> > Sure.
> > shutdown/reboot works fine, only platform is broken.
> 
> Thanks.
> 
> Now, I suspect the problem is somehow related to the hardware, so it would 
> help
> a lot if we could identify the piece of hardware (or driver) involved.
> 
> AFAICT, your system is a non-SMP one, so we can rule out
> disable/enable_nonboot_cpus().  To confirm that the problem is related to
> platform_finish(), can you please apply the appended debug patch and
> see if the suspend in the 'platform' mode works with it?

Yes, it's a Asus M2N-SLI-Deluxe Mainboard with a Athlon64 3200+
single core CPU.

> Also, would that be feasible for you to use 'shutdown' as a workaround in case
> the source of the problem is difficult to find and/or fix?

I guess so, but the below patch fixes the problem. :)

> ---
>  kernel/power/disk.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6.21-rc6/kernel/power/disk.c
> ===
> --- linux-2.6.21-rc6.orig/kernel/power/disk.c
> +++ linux-2.6.21-rc6/kernel/power/disk.c
> @@ -170,8 +170,8 @@ int pm_suspend_disk(void)
>  
>   if (in_suspend) {
>   enable_nonboot_cpus();
> - platform_finish();
>   device_resume();
> + platform_finish();
>   resume_console();
>   pr_debug("PM: writing image.\n");
>   error = swsusp_write();
> @@ -189,8 +189,8 @@ int pm_suspend_disk(void)
>   Enable_cpus:
>   enable_nonboot_cpus();
>   Resume_devices:
> - platform_finish();
>   device_resume();
> + platform_finish();
>   resume_console();
>   Thaw:
>   unprepare_processes();
> 

-- 
Tobias  PGP: http://9ac7e0bc.uguu.de
このメールは十割再利用されたビットで作られています。
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at

[PATCH 26/28] From: Andrew Morton <[EMAIL PROTECTED]>

2007-04-14 Thread Jeremy Fitzhardinge

x86_64:

arch/x86_64/kernel/../../i386/kernel/alternative.c: In function 
'alternative_instructions':
arch/x86_64/kernel/../../i386/kernel/alternative.c:374: error: 
'__parainstructions' undeclared (first use in this function)
arch/x86_64/kernel/../../i386/kernel/alternative.c:374: error: (Each undeclared 
identifier is reported only once
arch/x86_64/kernel/../../i386/kernel/alternative.c:374: error: for each 
function it appears in.)
arch/x86_64/kernel/../../i386/kernel/alternative.c:374: error: 
'__parainstructions_end' undeclared (first use in this function)

Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---
 include/asm-x86_64/alternative.h |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

===
--- 
a/include/asm-x86_64/alternative.h~rename-the-parainstructions-symbols-to-be-consistent-with-the-others-fix
+++ a/include/asm-x86_64/alternative.h
@@ -141,8 +141,8 @@ void apply_paravirt(struct paravirt_patc
 static inline void
 apply_paravirt(struct paravirt_patch *start, struct paravirt_patch *end)
 {}
-#define __start_parainstructions NULL
-#define __stop_parainstructions NULL
+#define __parainstructions NULL
+#define __parainstructions_end NULL
 #endif
 
 #endif /* _X86_64_ALTERNATIVE_H */
_

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 19/28] Dont implement native_kmap_atomic_pte for !HIGHPTE

2007-04-14 Thread Jeremy Fitzhardinge

Don't implement native_kmap_atomic_pte for !HIGHPTE case; it is never needed,
never called, and leaving it in is just plain confusing.  Making it isolated
to the config where it is used may help find bugs.

From: Zachary Amsden <[EMAIL PROTECTED]>
Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>
Acked-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

---
 arch/i386/kernel/paravirt.c |4 +---
 include/asm-i386/highmem.h  |5 -
 include/asm-i386/paravirt.h |4 
 3 files changed, 5 insertions(+), 8 deletions(-)

===
--- a/arch/i386/kernel/paravirt.c
+++ b/arch/i386/kernel/paravirt.c
@@ -318,9 +318,7 @@ struct paravirt_ops paravirt_ops = {
.ptep_get_and_clear = native_ptep_get_and_clear,
 
 #ifdef CONFIG_HIGHPTE
-   .kmap_atomic_pte = native_kmap_atomic_pte,
-#else
-   .kmap_atomic_pte = paravirt_nop,
+   .kmap_atomic_pte = kmap_atomic,
 #endif
 
 #ifdef CONFIG_X86_PAE
===
--- a/include/asm-i386/highmem.h
+++ b/include/asm-i386/highmem.h
@@ -74,11 +74,6 @@ void *kmap_atomic_pfn(unsigned long pfn,
 void *kmap_atomic_pfn(unsigned long pfn, enum km_type type);
 struct page *kmap_atomic_to_page(void *ptr);
 
-static inline void *native_kmap_atomic_pte(struct page *page, enum km_type 
type)
-{
-   return kmap_atomic(page, type);
-}
-
 #ifndef CONFIG_PARAVIRT
 #define kmap_atomic_pte(page, type)kmap_atomic(page, type)
 #endif
===
--- a/include/asm-i386/paravirt.h
+++ b/include/asm-i386/paravirt.h
@@ -190,7 +190,9 @@ struct paravirt_ops
 
pte_t (*ptep_get_and_clear)(pte_t *ptep);
 
+#ifdef CONFIG_HIGHPTE
void *(*kmap_atomic_pte)(struct page *page, enum km_type type);
+#endif
 
 #ifdef CONFIG_X86_PAE
void (*set_pte_atomic)(pte_t *ptep, pte_t pteval);
@@ -759,12 +761,14 @@ static inline void paravirt_release_pd(u
PVOP_VCALL1(release_pd, pfn);
 }
 
+#ifdef CONFIG_HIGHPTE
 static inline void *kmap_atomic_pte(struct page *page, enum km_type type)
 {
unsigned long ret;
ret = PVOP_CALL2(unsigned long, kmap_atomic_pte, page, type);
return (void *)ret;
 }
+#endif
 
 static inline void pte_update(struct mm_struct *mm, unsigned long addr,
  pte_t *ptep)

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 20/28] Now that the VDSO can be relocated, we can support it in VMI configurations.

2007-04-14 Thread Jeremy Fitzhardinge

From: Zachary Amsden <[EMAIL PROTECTED]>
Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

---
 arch/i386/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

===
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -220,7 +220,7 @@ config PARAVIRT
 
 config VMI
bool "VMI Paravirt-ops support"
-   depends on PARAVIRT && !COMPAT_VDSO
+   depends on PARAVIRT
help
  VMI provides a paravirtualized interface to the VMware ESX server
  (it could be used by other hypervisors in theory too, but is not

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 22/28] Convert VMI timer to use clock events

2007-04-14 Thread Jeremy Fitzhardinge

Convert VMI timer to use clock events, making it properly able to use the NO_HZ
infrastructure.  On UP systems, with no local APIC, we just continue to route
these events through the PIT.  On systems with a local APIC, or SMP, we provide
a single source interrupt chip which creates the local timer IRQ.  It actually
gets delivered by the APIC hardware, but we don't want to use the same local
APIC clocksource processing, so we create our own handler here.

From: Zachary Amsden <[EMAIL PROTECTED]>
Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>
CC: Dan Hecht <[EMAIL PROTECTED]>
CC: Ingo Molnar <[EMAIL PROTECTED]>
CC: Thomas Gleixner <[EMAIL PROTECTED]>

---
 arch/i386/kernel/Makefile   |2 
 arch/i386/kernel/entry.S|5 
 arch/i386/kernel/vmi.c  |   26 --
 arch/i386/kernel/vmiclock.c |  318 
 arch/i386/kernel/vmitime.c  |  482 ---
 include/asm-i386/vmi_time.h |   18 -
 6 files changed, 327 insertions(+), 524 deletions(-)

===
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -41,7 +41,7 @@ obj-$(CONFIG_K8_NB)   += k8.o
 obj-$(CONFIG_K8_NB)+= k8.o
 obj-$(CONFIG_STACK_UNWIND) += unwind.o
 
-obj-$(CONFIG_VMI)  += vmi.o vmitime.o
+obj-$(CONFIG_VMI)  += vmi.o vmiclock.o
 obj-$(CONFIG_PARAVIRT) += paravirt.o
 obj-y  += pcspeaker.o
 
===
--- a/arch/i386/kernel/vmi.c
+++ b/arch/i386/kernel/vmi.c
@@ -73,6 +73,9 @@ static struct {
void (*set_lazy_mode)(int mode);
 } vmi_ops;
 
+/* Cached VMI operations */
+struct vmi_timer_ops vmi_timer_ops;
+
 /*
  * VMI patching routines.
  */
@@ -231,18 +234,6 @@ static void vmi_nop(void)
 {
 }
 
-/* For NO_IDLE_HZ, we stop the clock when halting the kernel */
-static fastcall void vmi_safe_halt(void)
-{
-   int idle = vmi_stop_hz_timer();
-   vmi_ops.halt();
-   if (idle) {
-   local_irq_disable();
-   vmi_account_time_restart_hz_timer();
-   local_irq_enable();
-   }
-}
-
 #ifdef CONFIG_DEBUG_PAGE_TYPE
 
 #ifdef CONFIG_X86_PAE
@@ -714,7 +705,6 @@ do {
\
vmi_ops.cache = (void *)rel->eip;   \
}   \
 } while (0)
-
 
 /*
  * Activate the VMI interface and switch into paravirtualized mode
@@ -894,8 +884,8 @@ static inline int __init activate_vmi(vo
paravirt_ops.get_wallclock = vmi_get_wallclock;
paravirt_ops.set_wallclock = vmi_set_wallclock;
 #ifdef CONFIG_X86_LOCAL_APIC
-   paravirt_ops.setup_boot_clock = vmi_timer_setup_boot_alarm;
-   paravirt_ops.setup_secondary_clock = 
vmi_timer_setup_secondary_alarm;
+   paravirt_ops.setup_boot_clock = vmi_time_bsp_init;
+   paravirt_ops.setup_secondary_clock = vmi_time_ap_init;
 #endif
paravirt_ops.get_scheduled_cycles = vmi_get_sched_cycles;
paravirt_ops.get_cpu_khz = vmi_cpu_khz;
@@ -907,11 +897,7 @@ static inline int __init activate_vmi(vo
disable_vmi_timer = 1;
}
 
-   /* No idle HZ mode only works if VMI timer and no idle is enabled */
-   if (disable_noidle || disable_vmi_timer)
-   para_fill(safe_halt, Halt);
-   else
-   para_wrap(safe_halt, vmi_safe_halt, halt, Halt);
+   para_fill(safe_halt, Halt);
 
/*
 * Alternative instruction rewriting doesn't happen soon enough
===
--- a/include/asm-i386/vmi_time.h
+++ b/include/asm-i386/vmi_time.h
@@ -53,22 +53,8 @@ extern unsigned long vmi_cpu_khz(void);
 extern unsigned long vmi_cpu_khz(void);
 
 #ifdef CONFIG_X86_LOCAL_APIC
-extern void __init vmi_timer_setup_boot_alarm(void);
-extern void __devinit vmi_timer_setup_secondary_alarm(void);
-extern void apic_vmi_timer_interrupt(void);
-#endif
-
-#ifdef CONFIG_NO_IDLE_HZ
-extern int vmi_stop_hz_timer(void);
-extern void vmi_account_time_restart_hz_timer(void);
-#else
-static inline int vmi_stop_hz_timer(void)
-{
-   return 0;
-}
-static inline void vmi_account_time_restart_hz_timer(void)
-{
-}
+extern void __devinit vmi_time_bsp_init(void);
+extern void __devinit vmi_time_ap_init(void);
 #endif
 
 /*
===
--- a/arch/i386/kernel/entry.S
+++ b/arch/i386/kernel/entry.S
@@ -637,11 +637,6 @@ ENDPROC(name)
 /* The include is where all of the SMP etc. interrupts come from */
 #include "entry_arch.h"
 
-/* This alternate entry is needed because we hijack the apic LVTT */
-#if defined(CONFIG_VMI) && defined(CONFIG_X86_LOCAL_APIC)
-BUILD_INTERRUPT(apic_vmi_timer_interrupt,LOCAL_TIMER_VECTOR)
-#endif
-
 KPROBE_ENTRY(page_fault)

[PATCH 14/28] fix paravirt-documentation

2007-04-14 Thread Jeremy Fitzhardinge

Remove #defines, add enum for PARAVIRT_LAZY_FLUSH.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

---
 include/asm-i386/paravirt.h |7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

===
--- a/include/asm-i386/paravirt.h
+++ b/include/asm-i386/paravirt.h
@@ -30,6 +30,7 @@ enum paravirt_lazy_mode {
PARAVIRT_LAZY_NONE = 0,
PARAVIRT_LAZY_MMU = 1,
PARAVIRT_LAZY_CPU = 2,
+   PARAVIRT_LAZY_FLUSH = 3,
 };
 
 struct paravirt_ops
@@ -906,12 +907,6 @@ static inline void set_pmd(pmd_t *pmdp, 
 }
 #endif /* CONFIG_X86_PAE */
 
-/* Lazy mode for batching updates / context switch */
-#define PARAVIRT_LAZY_NONE 0
-#define PARAVIRT_LAZY_MMU  1
-#define PARAVIRT_LAZY_CPU  2
-#define PARAVIRT_LAZY_FLUSH 3
-
 #define  __HAVE_ARCH_ENTER_LAZY_CPU_MODE
 static inline void arch_enter_lazy_cpu_mode(void)
 {

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 04/28] deflate stack usage in lib/inflate.c

2007-04-14 Thread Jeremy Fitzhardinge

inflate_fixed and huft_build together use around 2.7k of stack.  When
using 4k stacks, I saw stack overflows from interrupts arriving while
unpacking the root initrd:

do_IRQ: stack overflow: 384
 [] show_trace_log_lvl+0x1a/0x30
 [] show_trace+0x12/0x14
 [] dump_stack+0x16/0x18
 [] do_IRQ+0x6d/0xd9
 [] xen_evtchn_do_upcall+0x6e/0xa2
 [] xen_hypervisor_callback+0x25/0x2c
 [] xen_restore_fl+0x27/0x29
 [] _spin_unlock_irqrestore+0x4a/0x50
 [] change_page_attr+0x577/0x584
 [] kernel_map_pages+0x8d/0xb4
 [] cache_alloc_refill+0x53f/0x632
 [] __kmalloc+0xc1/0x10d
 [] malloc+0x10/0x12
 [] huft_build+0x2a7/0x5fa
 [] inflate_fixed+0x91/0x136
 [] unpack_to_rootfs+0x5f2/0x8c1
 [] populate_rootfs+0x1e/0xe4

(This was under Xen, but there's no reason it couldn't happen on bare
  hardware.)

This patch mallocs the local variables, thereby reducing the stack
usage to sane levels.

Also, up the heap size for the kernel decompressor to deal with the
extra allocation.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Tim Yamin <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Matt Mackall <[EMAIL PROTECTED]>
Cc: Ivan Kokshaysky <[EMAIL PROTECTED]>
Cc: Richard Henderson <[EMAIL PROTECTED]>
Cc: Russell King <[EMAIL PROTECTED]>
Cc: Ian Molton <[EMAIL PROTECTED]>

---
 arch/alpha/boot/misc.c |2 -
 arch/arm/boot/compressed/misc.c|2 -
 arch/arm26/boot/compressed/misc.c  |2 -
 arch/i386/boot/compressed/misc.c   |2 -
 arch/x86_64/boot/compressed/misc.c |2 -
 lib/inflate.c  |   66 ++--
 6 files changed, 54 insertions(+), 22 deletions(-)

===
--- a/arch/alpha/boot/misc.c
+++ b/arch/alpha/boot/misc.c
@@ -98,7 +98,7 @@ static ulg free_mem_ptr;
 static ulg free_mem_ptr;
 static ulg free_mem_ptr_end;
 
-#define HEAP_SIZE 0x2000
+#define HEAP_SIZE 0x3000
 
 #include "../../../lib/inflate.c"
 
===
--- a/arch/arm/boot/compressed/misc.c
+++ b/arch/arm/boot/compressed/misc.c
@@ -239,7 +239,7 @@ static ulg free_mem_ptr;
 static ulg free_mem_ptr;
 static ulg free_mem_ptr_end;
 
-#define HEAP_SIZE 0x2000
+#define HEAP_SIZE 0x3000
 
 #include "../../../../lib/inflate.c"
 
===
--- a/arch/arm26/boot/compressed/misc.c
+++ b/arch/arm26/boot/compressed/misc.c
@@ -182,7 +182,7 @@ static ulg free_mem_ptr;
 static ulg free_mem_ptr;
 static ulg free_mem_ptr_end;
 
-#define HEAP_SIZE 0x2000
+#define HEAP_SIZE 0x3000
 
 #include "../../../../lib/inflate.c"
 
===
--- a/arch/i386/boot/compressed/misc.c
+++ b/arch/i386/boot/compressed/misc.c
@@ -189,7 +189,7 @@ static unsigned long free_mem_ptr;
 static unsigned long free_mem_ptr;
 static unsigned long free_mem_end_ptr;
 
-#define HEAP_SIZE 0x3000
+#define HEAP_SIZE 0x4000
 
 static char *vidmem = (char *)0xb8000;
 static int vidport;
===
--- a/arch/x86_64/boot/compressed/misc.c
+++ b/arch/x86_64/boot/compressed/misc.c
@@ -189,7 +189,7 @@ static long free_mem_ptr;
 static long free_mem_ptr;
 static long free_mem_end_ptr;
 
-#define HEAP_SIZE 0x6000
+#define HEAP_SIZE 0x7000
 
 static char *vidmem = (char *)0xb8000;
 static int vidport;
===
--- a/lib/inflate.c
+++ b/lib/inflate.c
@@ -292,7 +292,6 @@ STATIC int INIT huft_build(
oversubscribed set of lengths), and three if not enough memory. */
 {
   unsigned a;   /* counter for codes of length k */
-  unsigned c[BMAX+1];   /* bit length count table */
   unsigned f;   /* i repeats in table every f entries */
   int g;/* maximum code length */
   int h;/* table level */
@@ -303,18 +302,33 @@ STATIC int INIT huft_build(
   register unsigned *p; /* pointer into c[], b[], or v[] */
   register struct huft *q;  /* points to current table */
   struct huft r;/* table entry for structure assignment */
-  struct huft *u[BMAX]; /* table stack */
-  unsigned v[N_MAX];/* values in order of bit length */
   register int w;   /* bits before this table == (l * h) */
-  unsigned x[BMAX+1];   /* bit offsets, then code stack */
   unsigned *xp; /* pointer into x */
   int y;/* number of dummy codes added */
   unsigned z;   /* number of entries in current table */
+  struct {
+unsigned c[BMAX+1];   /* bit length count table */
+struct huft *u[BMAX]; /* table stack */
+unsigned v[N_MAX];/* values in order of bit length */
+unsigned x[BMAX+1];   /* bit offsets, then code stack */
+  }

[PATCH 25/28] From: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

2007-04-14 Thread Jeremy Fitzhardinge

The other symbols used to delineate the alt-instructions sections have the
form __foo/__foo_end.  Rename parainstructions to match.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---
 arch/i386/kernel/alternative.c |2 +-
 arch/i386/kernel/vmi.c |6 +++---
 arch/i386/kernel/vmlinux.lds.S |4 ++--
 include/asm-i386/alternative.h |4 ++--
 4 files changed, 8 insertions(+), 8 deletions(-)

===
--- a/arch/i386/kernel/alternative.c
+++ b/arch/i386/kernel/alternative.c
@@ -386,6 +386,6 @@ void __init alternative_instructions(voi
alternatives_smp_switch(0);
}
 #endif
-   apply_paravirt(__start_parainstructions, __stop_parainstructions);
+   apply_paravirt(__parainstructions, __parainstructions_end);
local_irq_restore(flags);
 }
===
--- a/arch/i386/kernel/vmi.c
+++ b/arch/i386/kernel/vmi.c
@@ -74,8 +74,8 @@ static struct {
 } vmi_ops;
 
 /* XXX move this to alternative.h */
-extern struct paravirt_patch __start_parainstructions[],
-   __stop_parainstructions[];
+extern struct paravirt_patch __parainstructions[],
+   __parainstructions_end[];
 
 /* Cached VMI operations */
 struct vmi_timer_ops vmi_timer_ops;
@@ -909,7 +909,7 @@ static inline int __init activate_vmi(vo
 * to do this before IRQs get reenabled.  Fortunately, it is
 * idempotent.
 */
-   apply_paravirt(__start_parainstructions, __stop_parainstructions);
+   apply_paravirt(__parainstructions, __parainstructions_end);
 
vmi_bringup();
 
===
--- a/arch/i386/kernel/vmlinux.lds.S
+++ b/arch/i386/kernel/vmlinux.lds.S
@@ -166,9 +166,9 @@ SECTIONS
   }
   . = ALIGN(4);
   .parainstructions : AT(ADDR(.parainstructions) - LOAD_OFFSET) {
-   __start_parainstructions = .;
+   __parainstructions = .;
*(.parainstructions)
-   __stop_parainstructions = .;
+   __parainstructions_end = .;
   }
   /* .exit.text is discard at runtime, not link time, to deal with references
  from .altinstructions and .eh_frame */
===
--- a/include/asm-i386/alternative.h
+++ b/include/asm-i386/alternative.h
@@ -124,8 +124,8 @@ apply_paravirt(struct paravirt_patch_sit
 apply_paravirt(struct paravirt_patch_site *start,
   struct paravirt_patch_site *end)
 {}
-#define __start_parainstructions NULL
-#define __stop_parainstructions NULL
+#define __parainstructions NULL
+#define __parainstructions_end NULL
 #endif
 
 #endif /* _I386_ALTERNATIVE_H */

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 21/28] Implement vmi_kmap_atomic_pte

2007-04-14 Thread Jeremy Fitzhardinge

Implement vmi_kmap_atomic_pte in terms of the backend set_linear_mapping
operation.  The conversion is rather straighforward; call kmap_atomic
and then inform the hypervisor of the page mapping.

The _flush_tlb damage is due to macros being pulled in from highmem.h.

From: Zachary Amsden <[EMAIL PROTECTED]>
Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

---
 arch/i386/kernel/vmi.c |   38 --
 1 file changed, 24 insertions(+), 14 deletions(-)

===
--- a/arch/i386/kernel/vmi.c
+++ b/arch/i386/kernel/vmi.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -65,8 +66,8 @@ static struct {
void (*release_page)(u32, u32);
void (*set_pte)(pte_t, pte_t *, unsigned);
void (*update_pte)(pte_t *, unsigned);
-   void (*set_linear_mapping)(int, u32, u32, u32);
-   void (*flush_tlb)(int);
+   void (*set_linear_mapping)(int, void *, u32, u32);
+   void (*_flush_tlb)(int);
void (*set_initial_ap_state)(int, int);
void (*halt)(void);
void (*set_lazy_mode)(int mode);
@@ -217,12 +218,12 @@ static void vmi_load_esp0(struct tss_str
 
 static void vmi_flush_tlb_user(void)
 {
-   vmi_ops.flush_tlb(VMI_FLUSH_TLB);
+   vmi_ops._flush_tlb(VMI_FLUSH_TLB);
 }
 
 static void vmi_flush_tlb_kernel(void)
 {
-   vmi_ops.flush_tlb(VMI_FLUSH_TLB | VMI_FLUSH_GLOBAL);
+   vmi_ops._flush_tlb(VMI_FLUSH_TLB | VMI_FLUSH_GLOBAL);
 }
 
 /* Stub to do nothing at all; used for delays and unimplemented calls */
@@ -345,8 +346,11 @@ static void vmi_check_page_type(u32 pfn,
 #define vmi_check_page_type(p,t) do { } while (0)
 #endif
 
-static void vmi_map_pt_hook(int type, pte_t *va, u32 pfn)
-{
+#ifdef CONFIG_HIGHPTE
+static void *vmi_kmap_atomic_pte(struct page *page, enum km_type type)
+{
+   void *va = kmap_atomic(page, type);
+
/*
 * Internally, the VMI ROM must map virtual addresses to physical
 * addresses for processing MMU updates.  By the time MMU updates
@@ -360,8 +364,11 @@ static void vmi_map_pt_hook(int type, pt
 *  args: SLOT VACOUNT PFN
 */
BUG_ON(type != KM_PTE0 && type != KM_PTE1);
-   vmi_ops.set_linear_mapping((type - KM_PTE0)+1, (u32)va, 1, pfn);
-}
+   vmi_ops.set_linear_mapping((type - KM_PTE0)+1, va, 1, 
page_to_pfn(page));
+
+   return va;
+}
+#endif
 
 static void vmi_allocate_pt(u32 pfn)
 {
@@ -656,7 +663,7 @@ void vmi_bringup(void)
 {
/* We must establish the lowmem mapping for MMU ops to work */
if (vmi_ops.set_linear_mapping)
-   vmi_ops.set_linear_mapping(0, __PAGE_OFFSET, max_low_pfn, 0);
+   vmi_ops.set_linear_mapping(0, (void *)__PAGE_OFFSET, 
max_low_pfn, 0);
 }
 
 /*
@@ -793,8 +800,8 @@ static inline int __init activate_vmi(vo
para_wrap(set_lazy_mode, vmi_set_lazy_mode, set_lazy_mode, SetLazyMode);
 
/* user and kernel flush are just handled with different flags to 
FlushTLB */
-   para_wrap(flush_tlb_user, vmi_flush_tlb_user, flush_tlb, FlushTLB);
-   para_wrap(flush_tlb_kernel, vmi_flush_tlb_kernel, flush_tlb, FlushTLB);
+   para_wrap(flush_tlb_user, vmi_flush_tlb_user, _flush_tlb, FlushTLB);
+   para_wrap(flush_tlb_kernel, vmi_flush_tlb_kernel, _flush_tlb, FlushTLB);
para_fill(flush_tlb_single, InvalPage);
 
/*
@@ -840,9 +847,12 @@ static inline int __init activate_vmi(vo
paravirt_ops.release_pt = vmi_release_pt;
paravirt_ops.release_pd = vmi_release_pd;
}
-#if 0
-   para_wrap(map_pt_hook, vmi_map_pt_hook, set_linear_mapping,
- SetLinearMapping);
+
+   /* Set linear is needed in all cases */
+   vmi_ops.set_linear_mapping = 
vmi_get_function(VMI_CALL_SetLinearMapping);
+#ifdef CONFIG_HIGHPTE
+   if (vmi_ops.set_linear_mapping)
+   paravirt_ops.kmap_atomic_pte = vmi_kmap_atomic_pte;
 #endif
 
/*

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 16/28] kRemove a warning about unused variable in !CONFIG_ACPI compilation.

2007-04-14 Thread Jeremy Fitzhardinge

From: Zachary Amsden <[EMAIL PROTECTED]>
Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>
CC: Trivial <[EMAIL PROTECTED]>

---
 arch/i386/kernel/acpi/earlyquirk.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

===
--- a/arch/i386/kernel/acpi/earlyquirk.c
+++ b/arch/i386/kernel/acpi/earlyquirk.c
@@ -21,8 +21,8 @@ static int __init nvidia_hpet_check(stru
 
 static int __init check_bridge(int vendor, int device)
 {
+#ifdef CONFIG_ACPI
static int warned;
-#ifdef CONFIG_ACPI
/* According to Nvidia all timer overrides are bogus unless HPET
   is enabled. */
if (!acpi_use_timer_override && vendor == PCI_VENDOR_ID_NVIDIA) {

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 23/28] Fix BusLogic to stop using check_region

2007-04-14 Thread Jeremy Fitzhardinge

I got so sick of seing the check_region warnings from BusLogic.c I actually
fixed it properly.  Never use check region, reserve it before the probe
with request region instead and check the error result; free region if
setup fails.  Should be functionally identical to the original except for
fixing the potential race.

From: Zachary Amsden <[EMAIL PROTECTED]>
Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>
CC: Lenoard N. Zubkoff <[EMAIL PROTECTED]>
CC: Michael Clay <[EMAIL PROTECTED]>

---
 drivers/scsi/BusLogic.c |   73 ++-
 1 file changed, 48 insertions(+), 25 deletions(-)

===
--- a/drivers/scsi/BusLogic.c
+++ b/drivers/scsi/BusLogic.c
@@ -579,17 +579,17 @@ static void __init BusLogic_InitializePr
/*
   Append the list of standard BusLogic MultiMaster ISA I/O Addresses.
 */
-   if (BusLogic_ProbeOptions.LimitedProbeISA ? 
BusLogic_ProbeOptions.Probe330 : check_region(0x330, 
BusLogic_MultiMasterAddressCount) == 0)
+   if (!BusLogic_ProbeOptions.LimitedProbeISA || 
BusLogic_ProbeOptions.Probe330)
BusLogic_AppendProbeAddressISA(0x330);
-   if (BusLogic_ProbeOptions.LimitedProbeISA ? 
BusLogic_ProbeOptions.Probe334 : check_region(0x334, 
BusLogic_MultiMasterAddressCount) == 0)
+   if (!BusLogic_ProbeOptions.LimitedProbeISA || 
BusLogic_ProbeOptions.Probe334)
BusLogic_AppendProbeAddressISA(0x334);
-   if (BusLogic_ProbeOptions.LimitedProbeISA ? 
BusLogic_ProbeOptions.Probe230 : check_region(0x230, 
BusLogic_MultiMasterAddressCount) == 0)
+   if (!BusLogic_ProbeOptions.LimitedProbeISA || 
BusLogic_ProbeOptions.Probe230)
BusLogic_AppendProbeAddressISA(0x230);
-   if (BusLogic_ProbeOptions.LimitedProbeISA ? 
BusLogic_ProbeOptions.Probe234 : check_region(0x234, 
BusLogic_MultiMasterAddressCount) == 0)
+   if (!BusLogic_ProbeOptions.LimitedProbeISA || 
BusLogic_ProbeOptions.Probe234)
BusLogic_AppendProbeAddressISA(0x234);
-   if (BusLogic_ProbeOptions.LimitedProbeISA ? 
BusLogic_ProbeOptions.Probe130 : check_region(0x130, 
BusLogic_MultiMasterAddressCount) == 0)
+   if (!BusLogic_ProbeOptions.LimitedProbeISA || 
BusLogic_ProbeOptions.Probe130)
BusLogic_AppendProbeAddressISA(0x130);
-   if (BusLogic_ProbeOptions.LimitedProbeISA ? 
BusLogic_ProbeOptions.Probe134 : check_region(0x134, 
BusLogic_MultiMasterAddressCount) == 0)
+   if (!BusLogic_ProbeOptions.LimitedProbeISA || 
BusLogic_ProbeOptions.Probe134)
BusLogic_AppendProbeAddressISA(0x134);
 }
 
@@ -795,7 +795,9 @@ static int __init BusLogic_InitializeMul
   host adapters are probed.
 */
if (!BusLogic_ProbeOptions.NoProbeISA)
-   if (PrimaryProbeInfo->IO_Address == 0 && 
(BusLogic_ProbeOptions.LimitedProbeISA ? BusLogic_ProbeOptions.Probe330 : 
check_region(0x330, BusLogic_MultiMasterAddressCount) == 0)) {
+   if (PrimaryProbeInfo->IO_Address == 0 &&
+   (!BusLogic_ProbeOptions.LimitedProbeISA ||
+BusLogic_ProbeOptions.Probe330)) {
PrimaryProbeInfo->HostAdapterType = 
BusLogic_MultiMaster;
PrimaryProbeInfo->HostAdapterBusType = BusLogic_ISA_Bus;
PrimaryProbeInfo->IO_Address = 0x330;
@@ -805,15 +807,25 @@ static int __init BusLogic_InitializeMul
   omitting the Primary I/O Address which has already been handled.
 */
if (!BusLogic_ProbeOptions.NoProbeISA) {
-   if (!StandardAddressSeen[1] && 
(BusLogic_ProbeOptions.LimitedProbeISA ? BusLogic_ProbeOptions.Probe334 : 
check_region(0x334, BusLogic_MultiMasterAddressCount) == 0))
+   if (!StandardAddressSeen[1] &&
+   (!BusLogic_ProbeOptions.LimitedProbeISA ||
+BusLogic_ProbeOptions.Probe334))
BusLogic_AppendProbeAddressISA(0x334);
-   if (!StandardAddressSeen[2] && 
(BusLogic_ProbeOptions.LimitedProbeISA ? BusLogic_ProbeOptions.Probe230 : 
check_region(0x230, BusLogic_MultiMasterAddressCount) == 0))
+   if (!StandardAddressSeen[2] &&
+   (!BusLogic_ProbeOptions.LimitedProbeISA ||
+BusLogic_ProbeOptions.Probe230))
BusLogic_AppendProbeAddressISA(0x230);
-   if (!StandardAddressSeen[3] && 
(BusLogic_ProbeOptions.LimitedProbeISA ? BusLogic_ProbeOptions.Probe234 : 
check_region(0x234, BusLogic_MultiMasterAddressCount) == 0))
+   if (!StandardAddressSeen[3] &&
+   (!BusLogic_ProbeOptions.LimitedProbeISA ||
+BusLogic_ProbeOptions.Probe234))
BusLogic_AppendProbeAddressISA(0x234);
-   if (!StandardAddressSeen[4] &&

[PATCH 12/28] i386: now its ok to use identify_boot_cpu

2007-04-14 Thread Jeremy Fitzhardinge

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

---
 arch/i386/kernel/cpu/bugs.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

===
--- a/arch/i386/kernel/cpu/bugs.c
+++ b/arch/i386/kernel/cpu/bugs.c
@@ -177,7 +177,7 @@ static void __init check_config(void)
 
 void __init check_bugs(void)
 {
-   identify_cpu(_cpu_data);
+   identify_boot_cpu();
 #ifndef CONFIG_SMP
printk("CPU: ");
print_cpu_info(_cpu_data);

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 18/28] Copying of the pgd range must happen under the pgd_lock

2007-04-14 Thread Jeremy Fitzhardinge

Copying of the pgd range must happen under the pgd_lock.  This got broken by
the paravirt changes in the -mm tree.  Badness can result if you copy the pgd
before being added to the list when splitting or rejoining large pages.

From: Zachary Amsden <[EMAIL PROTECTED]>
Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>
Acked-by: William Lee Irwin III <[EMAIL PROTECTED]>
---
 arch/i386/mm/pgtable.c |8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

===
--- a/arch/i386/mm/pgtable.c
+++ b/arch/i386/mm/pgtable.c
@@ -241,18 +241,16 @@ void pgd_ctor(void *pgd, struct kmem_cac
/* !PAE, no pagetable sharing */
memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
 
+   spin_lock_irqsave(_lock, flags);
+
+   /* must happen under lock */
clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD,
swapper_pg_dir + USER_PTRS_PER_PGD,
KERNEL_PGD_PTRS);
-
-   spin_lock_irqsave(_lock, flags);
-
-   /* must happen under lock */
paravirt_alloc_pd_clone(__pa(pgd) >> PAGE_SHIFT,
__pa(swapper_pg_dir) >> PAGE_SHIFT,
USER_PTRS_PER_PGD,
KERNEL_PGD_PTRS);
-
pgd_list_add(pgd);
spin_unlock_irqrestore(_lock, flags);
 }

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 17/28] x86: cleanup arch/i386/kernel/cpu/mcheck/p4.c

2007-04-14 Thread Jeremy Fitzhardinge

No, just no.  You do not use goto to skip a code block.  You do not
return an obvious variable from a singly-inlined function and give
the function a return value.  You don't put unexplained comments
about kmalloc in code which doesn't do dynamic allocation.  And
you don't leave stray warnings around for no good reason.

Also, when possible, it is better to use block scoped variables
because gcc can sometime generate better code.

From: Zachary Amsden <[EMAIL PROTECTED]>
Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

---
 arch/i386/kernel/cpu/mcheck/p4.c |   16 
 1 file changed, 4 insertions(+), 12 deletions(-)

===
--- a/arch/i386/kernel/cpu/mcheck/p4.c
+++ b/arch/i386/kernel/cpu/mcheck/p4.c
@@ -124,12 +124,9 @@ static void intel_init_thermal(struct cp
 
 
 /* P4/Xeon Extended MCE MSR retrieval, return 0 if unsupported */
-static inline int intel_get_extended_msrs(struct intel_mce_extended_msrs *r)
+static inline void intel_get_extended_msrs(struct intel_mce_extended_msrs *r)
 {
u32 h;
-
-   if (mce_num_extended_msrs == 0)
-   goto done;
 
rdmsr (MSR_IA32_MCG_EAX, r->eax, h);
rdmsr (MSR_IA32_MCG_EBX, r->ebx, h);
@@ -141,12 +138,6 @@ static inline int intel_get_extended_msr
rdmsr (MSR_IA32_MCG_ESP, r->esp, h);
rdmsr (MSR_IA32_MCG_EFLAGS, r->eflags, h);
rdmsr (MSR_IA32_MCG_EIP, r->eip, h);
-
-   /* can we rely on kmalloc to do a dynamic
-* allocation for the reserved registers?
-*/
-done:
-   return mce_num_extended_msrs;
 }
 
 static fastcall void intel_machine_check(struct pt_regs * regs, long 
error_code)
@@ -155,7 +146,6 @@ static fastcall void intel_machine_check
u32 alow, ahigh, high, low;
u32 mcgstl, mcgsth;
int i;
-   struct intel_mce_extended_msrs dbg;
 
rdmsr (MSR_IA32_MCG_STATUS, mcgstl, mcgsth);
if (mcgstl & (1<<0))/* Recoverable ? */
@@ -164,7 +154,9 @@ static fastcall void intel_machine_check
printk (KERN_EMERG "CPU %d: Machine Check Exception: %08x%08x\n",
smp_processor_id(), mcgsth, mcgstl);
 
-   if (intel_get_extended_msrs()) {
+   if (mce_num_extended_msrs > 0) {
+   struct intel_mce_extended_msrs dbg;
+   intel_get_extended_msrs();
printk (KERN_DEBUG "CPU %d: EIP: %08x EFLAGS: %08x\n",
smp_processor_id(), dbg.eip, dbg.eflags);
printk (KERN_DEBUG "\teax: %08x ebx: %08x ecx: %08x edx: 
%08x\n",

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/28] i386: map enough initial memory to create lowmem mappings

2007-04-14 Thread Jeremy Fitzhardinge

head.S creates the very initial pagetable for the kernel.  This just
maps enough space for the kernel itself, and an allocation bitmap.
The amount of mapped memory is rounded up to 4Mbytes, and so this
typically ends up mapping 8Mbytes of memory.

When booting, pagetable_init() needs to create mappings for all
lowmem, and the pagetables for these mappings are allocated from the
free pages around the kernel in low memory.  If the number of
pagetable pages + kernel size exceeds head.S's initial mapping, it
will end up faulting on an unmapped page.  This will only happen with
specific combinations of kernel size and memory size.

This patch makes sure that head.S also maps enough space to fit the
kernel pagetables as well as the kernel itself.  It ends up using an
additional two pages of unreclaimable memory.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Acked-by: "H. Peter Anvin" <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Zachary Amsden <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Cc: "Eric W. Biederman" <[EMAIL PROTECTED]>
Cc: Linus Torvalds <[EMAIL PROTECTED]>,

---
 arch/i386/kernel/asm-offsets.c |6 ++
 arch/i386/kernel/head.S|   25 -
 2 files changed, 26 insertions(+), 5 deletions(-)

===
--- a/arch/i386/kernel/asm-offsets.c
+++ b/arch/i386/kernel/asm-offsets.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include "sigframe.h"
+#include 
 #include 
 #include 
 #include 
@@ -96,6 +97,11 @@ void foo(void)
 sizeof(struct tss_struct));
 
DEFINE(PAGE_SIZE_asm, PAGE_SIZE);
+   DEFINE(PAGE_SHIFT_asm, PAGE_SHIFT);
+   DEFINE(PTRS_PER_PTE, PTRS_PER_PTE);
+   DEFINE(PTRS_PER_PMD, PTRS_PER_PMD);
+   DEFINE(PTRS_PER_PGD, PTRS_PER_PGD);
+
DEFINE(VDSO_PRELINK_asm, VDSO_PRELINK);
 
OFFSET(crypto_tfm_ctx_offset, crypto_tfm, __crt_ctx);
===
--- a/arch/i386/kernel/head.S
+++ b/arch/i386/kernel/head.S
@@ -34,17 +34,32 @@
 
 /*
  * This is how much memory *in addition to the memory covered up to
- * and including _end* we need mapped initially.  We need one bit for
- * each possible page, but only in low memory, which means
- * 2^32/4096/8 = 128K worst case (4G/4G split.)
+ * and including _end* we need mapped initially.
+ * We need:
+ *  - one bit for each possible page, but only in low memory, which means
+ * 2^32/4096/8 = 128K worst case (4G/4G split.)
+ *  - enough space to map all low memory, which means
+ * (2^32/4096) / 1024 pages (worst case, non PAE)
+ * (2^32/4096) / 512 + 4 pages (worst case for PAE)
+ *  - a few pages for allocator use before the kernel pagetable has
+ * been set up
  *
  * Modulo rounding, each megabyte assigned here requires a kilobyte of
  * memory, which is currently unreclaimed.
  *
  * This should be a multiple of a page.
  */
-#define INIT_MAP_BEYOND_END(128*1024)
-
+LOW_PAGES = 1<<(32-PAGE_SHIFT_asm)
+
+#if PTRS_PER_PMD > 1
+PAGE_TABLE_SIZE = (LOW_PAGES / PTRS_PER_PMD) + PTRS_PER_PGD
+#else
+PAGE_TABLE_SIZE = (LOW_PAGES / PTRS_PER_PGD)
+#endif
+BOOTBITMAP_SIZE = LOW_PAGES / 8
+ALLOCATOR_SLOP = 4
+
+INIT_MAP_BEYOND_END = BOOTBITMAP_SIZE + (PAGE_TABLE_SIZE + 
ALLOCATOR_SLOP)*PAGE_SIZE_asm
 
 /*
  * 32-bit kernel entrypoint; only used by the boot CPU.  On entry,

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 15/28] In compat mode, the return value here was uninitialized.

2007-04-14 Thread Jeremy Fitzhardinge

From: Zachary Amsden <[EMAIL PROTECTED]>
Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

---
 arch/i386/kernel/sysenter.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

===
--- a/arch/i386/kernel/sysenter.c
+++ b/arch/i386/kernel/sysenter.c
@@ -254,7 +254,7 @@ int arch_setup_additional_pages(struct l
 {
struct mm_struct *mm = current->mm;
unsigned long addr;
-   int ret;
+   int ret = 0;
bool compat;
 
down_write(>mmap_sem);

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 06/28] Convert PDA into the percpu section

2007-04-14 Thread Jeremy Fitzhardinge

Currently x86 (similar to x84-64) has a special per-cpu structure
called "i386_pda" which can be easily and efficiently referenced via
the %fs register.  An ELF section is more flexible than a structure,
allowing any piece of code to use this area.  Indeed, such a section
already exists: the per-cpu area.

So this patch:
(1) Removes the PDA and uses per-cpu variables for each current member.
(2) Replaces the __KERNEL_PDA segment with __KERNEL_PERCPU.
(3) Creates a per-cpu mirror of __per_cpu_offset called this_cpu_off, which
can be used to calculate addresses for this CPU's variables.
(4) Simplifies startup, because %fs doesn't need to be loaded with a
special segment at early boot; it can be deferred until the first
percpu area is allocated (or never for UP).

The result is less code and one less x86-specific concept.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
---
 arch/i386/kernel/asm-offsets.c |5 -
 arch/i386/kernel/cpu/common.c  |   17 -
 arch/i386/kernel/entry.S   |5 -
 arch/i386/kernel/head.S|   31 +
 arch/i386/kernel/i386_ksyms.c  |2 
 arch/i386/kernel/irq.c |3 
 arch/i386/kernel/process.c |   12 ++-
 arch/i386/kernel/smpboot.c |   34 --
 arch/i386/kernel/vmi.c |6 -
 arch/i386/kernel/vmlinux.lds.S |1 
 include/asm-i386/current.h |5 -
 include/asm-i386/irq_regs.h|   12 ++-
 include/asm-i386/pda.h |   99 --
 include/asm-i386/percpu.h  |  132 +---
 include/asm-i386/processor.h   |2 
 include/asm-i386/segment.h |6 -
 include/asm-i386/smp.h |4 -
 include/asm-i386/unwind.h  |2 
 18 files changed, 180 insertions(+), 198 deletions(-)

===
--- a/arch/i386/kernel/asm-offsets.c
+++ b/arch/i386/kernel/asm-offsets.c
@@ -15,7 +15,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #define DEFINE(sym, val) \
 asm volatile("\n->" #sym " %0 " #val : : "i" (val))
@@ -101,10 +100,6 @@ void foo(void)
 
OFFSET(crypto_tfm_ctx_offset, crypto_tfm, __crt_ctx);
 
-   BLANK();
-   OFFSET(PDA_cpu, i386_pda, cpu_number);
-   OFFSET(PDA_pcurrent, i386_pda, pcurrent);
-
 #ifdef CONFIG_PARAVIRT
BLANK();
OFFSET(PARAVIRT_enabled, paravirt_ops, paravirt_enabled);
===
--- a/arch/i386/kernel/cpu/common.c
+++ b/arch/i386/kernel/cpu/common.c
@@ -18,7 +18,6 @@
 #include 
 #include 
 #endif
-#include 
 
 #include "cpu.h"
 
@@ -47,12 +46,9 @@ DEFINE_PER_CPU(struct gdt_page, gdt_page
[GDT_ENTRY_APMBIOS_BASE+2] = { 0x, 0x00409200 }, /* data */
 
[GDT_ENTRY_ESPFIX_SS] = { 0x, 0x00c09200 },
-   [GDT_ENTRY_PDA] = { 0x, 0x00c09200 }, /* set in setup_pda */
+   [GDT_ENTRY_PERCPU] = { 0x, 0x },
 } };
 EXPORT_PER_CPU_SYMBOL_GPL(gdt_page);
-
-DEFINE_PER_CPU(struct i386_pda, _cpu_pda);
-EXPORT_PER_CPU_SYMBOL(_cpu_pda);
 
 static int cachesize_override __cpuinitdata = -1;
 static int disable_x86_fxsr __cpuinitdata;
@@ -634,20 +630,13 @@ void __init early_cpu_init(void)
 #endif
 }
 
-/* Make sure %gs is initialized properly in idle threads */
+/* Make sure %fs is initialized properly in idle threads */
 struct pt_regs * __devinit idle_regs(struct pt_regs *regs)
 {
memset(regs, 0, sizeof(struct pt_regs));
-   regs->xfs = __KERNEL_PDA;
+   regs->xfs = __KERNEL_PERCPU;
return regs;
 }
-
-/* Initial PDA used by boot CPU */
-struct i386_pda boot_pda = {
-   ._pda = _pda,
-   .cpu_number = 0,
-   .pcurrent = _task,
-};
 
 /*
  * cpu_init() initializes state that is per-CPU. Some data is already
===
--- a/arch/i386/kernel/entry.S
+++ b/arch/i386/kernel/entry.S
@@ -132,7 +132,7 @@ 1:
movl $(__USER_DS), %edx; \
movl %edx, %ds; \
movl %edx, %es; \
-   movl $(__KERNEL_PDA), %edx; \
+   movl $(__KERNEL_PERCPU), %edx; \
movl %edx, %fs
 
 #define RESTORE_INT_REGS \
@@ -556,7 +556,6 @@ END(syscall_badsys)
 
 #define FIXUP_ESPFIX_STACK \
/* since we are on a wrong stack, we cant make it a C code :( */ \
-   movl %fs:PDA_cpu, %ebx; \
PER_CPU(gdt_page, %ebx); \
GET_DESC_BASE(GDT_ENTRY_ESPFIX_SS, %ebx, %eax, %ax, %al, %ah); \
addl %esp, %eax; \
@@ -681,7 +680,7 @@ error_code:
pushl %fs
CFI_ADJUST_CFA_OFFSET 4
/*CFI_REL_OFFSET fs, 0*/
-   movl $(__KERNEL_PDA), %ecx
+   movl $(__KERNEL_PERCPU), %ecx
movl %ecx, %fs
UNWIND_ESPFIX_STACK
popl %ecx
===
--- a/arch/i386/kernel/head.S
+++ b/arch/i386/kernel/head.S
@@ -317,12 +317,12 @@ 2:movl

[PATCH 11/28] x86: incremental update for i386 and x86-64 check_bugs

2007-04-14 Thread Jeremy Fitzhardinge

i386 bugs.c shouldn't refer to identify_boot_cpu yet, since it doesn't
get introduced until the identify_cpu patch.

Remove spurious comments, headers and keywords from x86-64 bugs.[ch].

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

---
 arch/i386/kernel/cpu/bugs.c |2 +-
 arch/x86_64/kernel/bugs.c   |9 +
 include/asm-i386/bugs.h |2 +-
 3 files changed, 3 insertions(+), 10 deletions(-)

===
--- a/arch/i386/kernel/cpu/bugs.c
+++ b/arch/i386/kernel/cpu/bugs.c
@@ -177,7 +177,7 @@ static void __init check_config(void)
 
 void __init check_bugs(void)
 {
-   identify_boot_cpu();
+   identify_cpu(_cpu_data);
 #ifndef CONFIG_SMP
printk("CPU: ");
print_cpu_info(_cpu_data);
===
--- a/arch/x86_64/kernel/bugs.c
+++ b/arch/x86_64/kernel/bugs.c
@@ -3,19 +3,12 @@
  *
  *  Copyright (C) 1994  Linus Torvalds
  *  Copyright (C) 2000  SuSE
- *
- * This is included by init/main.c to check for architecture-dependent bugs.
- *
- * Needs:
- * void check_bugs(void);
  */
 
 #include 
+#include 
 #include 
 #include 
-#include 
-#include 
-#include 
 
 void __init check_bugs(void)
 {
===
--- a/include/asm-i386/bugs.h
+++ b/include/asm-i386/bugs.h
@@ -7,6 +7,6 @@
 #ifndef _ASM_I386_BUG_H
 #define _ASM_I386_BUG_H
 
-extern void __init check_bugs(void);
+void check_bugs(void);
 
 #endif /* _ASM_I386_BUG_H */

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 13/28] paravirt: flush lazy mmu updates on kunmap_atomic

2007-04-14 Thread Jeremy Fitzhardinge

kunmap_atomic should flush any pending lazy mmu updates, mainly to be
consistent with kmap_atomic, and to preserve its normal behaviour.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

---
 arch/i386/mm/highmem.c |1 +
 1 file changed, 1 insertion(+)

===
--- a/arch/i386/mm/highmem.c
+++ b/arch/i386/mm/highmem.c
@@ -72,6 +72,7 @@ void kunmap_atomic(void *kvaddr, enum km
 #endif
}
 
+   arch_flush_lazy_mmu_mode();
pagefault_enable();
 }
 

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 09/28] Fix UP gdt bugs

2007-04-14 Thread Jeremy Fitzhardinge

Fixes two problems with the GDT when compiling for uniprocessor:
 - There's no percpu segment, so trying to load its selector into %fs fails.
   Use a null selector instead.
 - The real gdt needs to be loaded at some point.  Do it in cpu_init().

Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>

---
 arch/i386/kernel/cpu/common.c |   13 +
 arch/i386/kernel/smpboot.c|   12 
 include/asm-i386/processor.h  |1 +
 include/asm-i386/segment.h|4 
 4 files changed, 18 insertions(+), 12 deletions(-)

===
--- a/arch/i386/kernel/cpu/common.c
+++ b/arch/i386/kernel/cpu/common.c
@@ -644,6 +644,18 @@ struct pt_regs * __devinit idle_regs(str
return regs;
 }
 
+/* Current gdt points %fs at the "master" per-cpu area: after this,
+ * it's on the real one. */
+void switch_to_new_gdt(void)
+{
+   struct Xgt_desc_struct gdt_descr;
+
+   gdt_descr.address = (long)get_cpu_gdt_table(smp_processor_id());
+   gdt_descr.size = GDT_SIZE - 1;
+   load_gdt(_descr);
+   asm("mov %0, %%fs" : : "r" (__KERNEL_PERCPU) : "memory");
+}
+
 /*
  * cpu_init() initializes state that is per-CPU. Some data is already
  * initialized (naturally) in the bootstrap process, such as the GDT
@@ -674,6 +688,7 @@ void __cpuinit cpu_init(void)
}
 
load_idt(_descr);
+   switch_to_new_gdt();
 
/*
 * Set up and load the per-CPU TSS and LDT
===
--- a/arch/i386/kernel/smpboot.c
+++ b/arch/i386/kernel/smpboot.c
@@ -1176,18 +1176,6 @@ void __init native_smp_prepare_cpus(unsi
smp_boot_cpus(max_cpus);
 }
 
-/* Current gdt points %fs at the "master" per-cpu area: after this,
- * it's on the real one. */
-static inline void switch_to_new_gdt(void)
-{
-   struct Xgt_desc_struct gdt_descr;
-
-   gdt_descr.address = (long)get_cpu_gdt_table(smp_processor_id());
-   gdt_descr.size = GDT_SIZE - 1;
-   load_gdt(_descr);
-   asm("mov %0, %%fs" : : "r" (__KERNEL_PERCPU) : "memory");
-}
-
 void __init native_smp_prepare_boot_cpu(void)
 {
unsigned int cpu = smp_processor_id();
===
--- a/include/asm-i386/processor.h
+++ b/include/asm-i386/processor.h
@@ -777,6 +777,7 @@ extern int sysenter_setup(void);
 extern int sysenter_setup(void);
 
 extern void cpu_set_gdt(int);
+extern void switch_to_new_gdt(void);
 extern void cpu_init(void);
 
 #endif /* __ASM_I386_PROCESSOR_H */
===
--- a/include/asm-i386/segment.h
+++ b/include/asm-i386/segment.h
@@ -75,7 +75,11 @@
 #define __ESPFIX_SS (GDT_ENTRY_ESPFIX_SS * 8)
 
 #define GDT_ENTRY_PERCPU   (GDT_ENTRY_KERNEL_BASE + 15)
+#ifdef CONFIG_SMP
 #define __KERNEL_PERCPU (GDT_ENTRY_PERCPU * 8)
+#else
+#define __KERNEL_PERCPU 0
+#endif
 
 #define GDT_ENTRY_DOUBLEFAULT_TSS  31
 

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 05/28] Page-align the GDT

2007-04-14 Thread Jeremy Fitzhardinge

Xen wants a dedicated page for the GDT.  I believe VMI likes it too.
lguest, KVM and native don't care.

Simple transformation to page-aligned "struct gdt_page".

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
Acked-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

---
 arch/i386/kernel/cpu/common.c |6 +++---
 arch/i386/kernel/entry.S  |2 +-
 arch/i386/kernel/head.S   |2 +-
 arch/i386/kernel/traps.c  |2 +-
 include/asm-i386/desc.h   |9 +++--
 5 files changed, 13 insertions(+), 8 deletions(-)

===
--- a/arch/i386/kernel/cpu/common.c
+++ b/arch/i386/kernel/cpu/common.c
@@ -22,7 +22,7 @@
 
 #include "cpu.h"
 
-DEFINE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]) = {
+DEFINE_PER_CPU(struct gdt_page, gdt_page) = { .gdt = {
[GDT_ENTRY_KERNEL_CS] = { 0x, 0x00cf9a00 },
[GDT_ENTRY_KERNEL_DS] = { 0x, 0x00cf9200 },
[GDT_ENTRY_DEFAULT_USER_CS] = { 0x, 0x00cffa00 },
@@ -48,8 +48,8 @@ DEFINE_PER_CPU(struct desc_struct, cpu_g
 
[GDT_ENTRY_ESPFIX_SS] = { 0x, 0x00c09200 },
[GDT_ENTRY_PDA] = { 0x, 0x00c09200 }, /* set in setup_pda */
-};
-EXPORT_PER_CPU_SYMBOL_GPL(cpu_gdt);
+} };
+EXPORT_PER_CPU_SYMBOL_GPL(gdt_page);
 
 DEFINE_PER_CPU(struct i386_pda, _cpu_pda) = {
._pda = _cpu___cpu_pda,
===
--- a/arch/i386/kernel/entry.S
+++ b/arch/i386/kernel/entry.S
@@ -558,7 +558,7 @@ END(syscall_badsys)
 #define FIXUP_ESPFIX_STACK \
/* since we are on a wrong stack, we cant make it a C code :( */ \
movl %fs:PDA_cpu, %ebx; \
-   PER_CPU(cpu_gdt, %ebx); \
+   PER_CPU(gdt_page, %ebx); \
GET_DESC_BASE(GDT_ENTRY_ESPFIX_SS, %ebx, %eax, %ax, %al, %ah); \
addl %esp, %eax; \
pushl $__KERNEL_DS; \
===
--- a/arch/i386/kernel/head.S
+++ b/arch/i386/kernel/head.S
@@ -599,7 +599,7 @@ idt_descr:
.word 0 # 32 bit align gdt_desc.address
 ENTRY(early_gdt_descr)
.word GDT_ENTRIES*8-1
-   .long per_cpu__cpu_gdt  /* Overwritten for secondary CPUs */
+   .long per_cpu__gdt_page /* Overwritten for secondary CPUs */
 
 /*
  * The boot_gdt_table must mirror the equivalent in setup.S and is
===
--- a/arch/i386/kernel/traps.c
+++ b/arch/i386/kernel/traps.c
@@ -1037,7 +1037,7 @@ fastcall unsigned long patch_espfix_desc
 fastcall unsigned long patch_espfix_desc(unsigned long uesp,
  unsigned long kesp)
 {
-   struct desc_struct *gdt = __get_cpu_var(cpu_gdt);
+   struct desc_struct *gdt = __get_cpu_var(gdt_page).gdt;
unsigned long base = (kesp - uesp) & -THREAD_SIZE;
unsigned long new_kesp = kesp - base;
unsigned long lim_pages = (new_kesp | (THREAD_SIZE - 1)) >> PAGE_SHIFT;
===
--- a/include/asm-i386/desc.h
+++ b/include/asm-i386/desc.h
@@ -18,10 +18,15 @@ struct Xgt_desc_struct {
unsigned short pad;
 } __attribute__ ((packed));
 
-DECLARE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]);
+struct gdt_page
+{
+   struct desc_struct gdt[GDT_ENTRIES];
+} __attribute__((aligned(PAGE_SIZE)));
+DECLARE_PER_CPU(struct gdt_page, gdt_page);
+
 static inline struct desc_struct *get_cpu_gdt_table(unsigned int cpu)
 {
-   return per_cpu(cpu_gdt, cpu);
+   return per_cpu(gdt_page, cpu).gdt;
 }
 
 extern struct Xgt_desc_struct idt_descr;

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 03/28] fix allow-percpu-variables-to-be-page-aligned.patch

2007-04-14 Thread Jeremy Fitzhardinge

Make sure allocation is page-aligned.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

---
 init/main.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

===
--- a/init/main.c
+++ b/init/main.c
@@ -370,7 +370,7 @@ static void __init setup_per_cpu_areas(v
 
/* Copy section for each CPU (we discard the original) */
size = ALIGN(PERCPU_ENOUGH_ROOM, PAGE_SIZE);
-   ptr = alloc_bootmem(size * nr_possible_cpus);
+   ptr = alloc_bootmem_pages(size * nr_possible_cpus);
 
for_each_possible_cpu(i) {
__per_cpu_offset[i] = ptr - __per_cpu_start;

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/28] Account for module percpu space separately from kernel percpu

2007-04-14 Thread Jeremy Fitzhardinge

Rather than using a single constant PERCPU_ENOUGH_ROOM, compute it as
the sum of kernel_percpu + PERCPU_MODULE_RESERVE.  This is now common
to all architectures; if an architecture wants to set
PERCPU_ENOUGH_ROOM to something special, then it may do so (ia64 is
the only one which does).

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Eric W. Biederman <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>

---
 include/asm-alpha/percpu.h   |   14 --
 include/asm-sparc64/percpu.h |   10 --
 include/asm-x86_64/percpu.h  |   10 --
 include/linux/percpu.h   |9 -
 kernel/module.c  |2 +-
 5 files changed, 9 insertions(+), 36 deletions(-)

===
--- a/include/asm-alpha/percpu.h
+++ b/include/asm-alpha/percpu.h
@@ -1,19 +1,5 @@
 #ifndef __ALPHA_PERCPU_H
 #define __ALPHA_PERCPU_H
-
-/*
- * Increase the per cpu area for Alpha so that
- * modules using percpu area can load.
- */
-#ifdef CONFIG_MODULES
-# define PERCPU_MODULE_RESERVE 8192
-#else
-# define PERCPU_MODULE_RESERVE 0
-#endif
-
-#define PERCPU_ENOUGH_ROOM \
-   (ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES) + \
-PERCPU_MODULE_RESERVE)
 
 #include 
 
===
--- a/include/asm-sparc64/percpu.h
+++ b/include/asm-sparc64/percpu.h
@@ -4,16 +4,6 @@
 #include 
 
 #ifdef CONFIG_SMP
-
-#ifdef CONFIG_MODULES
-# define PERCPU_MODULE_RESERVE 8192
-#else
-# define PERCPU_MODULE_RESERVE 0
-#endif
-
-#define PERCPU_ENOUGH_ROOM \
-   (ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES) + \
-PERCPU_MODULE_RESERVE)
 
 extern void setup_per_cpu_areas(void);
 
===
--- a/include/asm-x86_64/percpu.h
+++ b/include/asm-x86_64/percpu.h
@@ -10,16 +10,6 @@
 #ifdef CONFIG_SMP
 
 #include 
-
-#ifdef CONFIG_MODULES
-# define PERCPU_MODULE_RESERVE 8192
-#else
-# define PERCPU_MODULE_RESERVE 0
-#endif
-
-#define PERCPU_ENOUGH_ROOM \
-   (ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES) + \
-PERCPU_MODULE_RESERVE)
 
 #define __per_cpu_offset(cpu) (cpu_pda(cpu)->data_offset)
 #define __my_cpu_offset() read_pda(data_offset)
===
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -11,8 +11,15 @@
 
 /* Enough to cover all DEFINE_PER_CPUs in kernel, including modules. */
 #ifndef PERCPU_ENOUGH_ROOM
-#define PERCPU_ENOUGH_ROOM 32768
+#ifdef CONFIG_MODULES
+#define PERCPU_MODULE_RESERVE  8192
+#else
+#define PERCPU_MODULE_RESERVE  0
 #endif
+
+#define PERCPU_ENOUGH_ROOM \
+   (__per_cpu_end - __per_cpu_start + PERCPU_MODULE_RESERVE)
+#endif /* PERCPU_ENOUGH_ROOM */
 
 /*
  * Must be an lvalue. Since @var must be a simple identifier,
===
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -430,7 +430,7 @@ static int percpu_modinit(void)
pcpu_size = kmalloc(sizeof(pcpu_size[0]) * pcpu_num_allocated,
GFP_KERNEL);
/* Static in-kernel percpu data (used). */
-   pcpu_size[0] = -ALIGN(__per_cpu_end-__per_cpu_start, SMP_CACHE_BYTES);
+   pcpu_size[0] = -(__per_cpu_end-__per_cpu_start);
/* Free room. */
pcpu_size[1] = PERCPU_ENOUGH_ROOM + pcpu_size[0];
if (pcpu_size[1] < 0) {

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 28/28] Add a sched_clock paravirt_op

2007-04-14 Thread Jeremy Fitzhardinge

The tsc-based get_scheduled_cycles interface is not a good match for
Xen's runstate accounting, which reports everything in nanoseconds.

This patch replaces this interface with a sched_clock interface, which
matches both Xen and VMI's requirements.

In order to do this, we:
   1. replace get_scheduled_cycles with sched_clock
   2. hoist cycles_2_ns into a common header
   3. update vmi accordingly

One thing to note: because sched_clock is implemented as a weak
function in kernel/sched.c, we must define a real function in order to
override this weak binding.  This means the usual paravirt_ops
technique of using an inline function won't work in this case.


Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Zachary Amsden <[EMAIL PROTECTED]>
Cc: Dan Hecht <[EMAIL PROTECTED]>
Cc: john stultz <[EMAIL PROTECTED]>

---
 arch/i386/kernel/paravirt.c|2 -
 arch/i386/kernel/sched-clock.c |   43 ++---
 arch/i386/kernel/vmi.c |2 -
 arch/i386/kernel/vmiclock.c|6 ++---
 include/asm-i386/paravirt.h|7 --
 include/asm-i386/timer.h   |   46 +++-
 include/asm-i386/vmi_time.h|2 -
 7 files changed, 73 insertions(+), 35 deletions(-)

===
--- a/arch/i386/kernel/paravirt.c
+++ b/arch/i386/kernel/paravirt.c
@@ -268,7 +268,7 @@ struct paravirt_ops paravirt_ops = {
.write_msr = native_write_msr_safe,
.read_tsc = native_read_tsc,
.read_pmc = native_read_pmc,
-   .get_scheduled_cycles = native_read_tsc,
+   .sched_clock = native_sched_clock,
.get_cpu_khz = native_calculate_cpu_khz,
.load_tr_desc = native_load_tr_desc,
.set_ldt = native_set_ldt,
===
--- a/arch/i386/kernel/sched-clock.c
+++ b/arch/i386/kernel/sched-clock.c
@@ -35,28 +35,7 @@
  * [EMAIL PROTECTED] "math is hard, lets go shopping!"
  */
 
-#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
-
-struct sc_data {
-   unsigned int cyc2ns_scale;
-   unsigned long long sync_tsc;
-   unsigned long long ns_base;
-   unsigned long long last_val;
-   unsigned long long sync_jiffies;
-};
-
-static DEFINE_PER_CPU(struct sc_data, sc_data);
-
-static inline unsigned long long cycles_2_ns(struct sc_data *sc, unsigned long 
long cyc)
-{
-   unsigned long long ns;
-
-   cyc -= sc->sync_tsc;
-   ns = (cyc * sc->cyc2ns_scale) >> CYC2NS_SCALE_FACTOR;
-   ns += sc->ns_base;
-
-   return ns;
-}
+DEFINE_PER_CPU(struct sc_data, sc_data);
 
 /*
  * Scheduler clock - returns current time in nanosec units.
@@ -66,7 +45,7 @@ static inline unsigned long long cycles_
  * [1] no attempt to stop CPU instruction reordering, which can hit
  * in a 100 instruction window or so.
  */
-unsigned long long sched_clock(void)
+unsigned long long native_sched_clock(void)
 {
unsigned long long r;
struct sc_data *sc = _cpu_var(sc_data);
@@ -81,8 +60,8 @@ unsigned long long sched_clock(void)
sc->last_val = r;
local_irq_restore(flags);
} else {
-   get_scheduled_cycles(r);
-   r = cycles_2_ns(sc, r);
+   rdtscll(r);
+   r = cycles_2_ns(r);
sc->last_val = r;
}
 
@@ -90,6 +69,18 @@ unsigned long long sched_clock(void)
 
return r;
 }
+
+/* We need to define a real function for sched_clock, to override the
+   weak default version */
+#ifdef CONFIG_PARAVIRT
+unsigned long long sched_clock(void)
+{
+   return paravirt_sched_clock();
+}
+#else
+unsigned long long sched_clock(void)
+   __attribute__((alias("native_sched_clock")));
+#endif
 
 /* Resync with new CPU frequency */
 static void resync_sc_freq(struct sc_data *sc, unsigned int newfreq)
@@ -103,7 +94,7 @@ static void resync_sc_freq(struct sc_dat
   because sched_clock callers should be able to tolerate small
   errors. */
sc->ns_base = ktime_to_ns(ktime_get());
-   get_scheduled_cycles(sc->sync_tsc);
+   rdtscll(sc->sync_tsc);
sc->cyc2ns_scale = (100 << CYC2NS_SCALE_FACTOR) / newfreq;
 }
 
===
--- a/arch/i386/kernel/vmi.c
+++ b/arch/i386/kernel/vmi.c
@@ -887,7 +887,7 @@ static inline int __init activate_vmi(vo
paravirt_ops.setup_boot_clock = vmi_time_bsp_init;
paravirt_ops.setup_secondary_clock = vmi_time_ap_init;
 #endif
-   paravirt_ops.get_scheduled_cycles = vmi_get_sched_cycles;
+   paravirt_ops.sched_clock = vmi_sched_clock;
paravirt_ops.get_cpu_khz = vmi_cpu_khz;
 
/* We have true wallclock functions; disable CMOS clock sync */
===
--- a/arch/i386/kernel/vmiclock.c
+++

[PATCH 08/28] Define per_cpu_offset

2007-04-14 Thread Jeremy Fitzhardinge

Define per_cpu_offset in asm-i386/percpu.h when SMP defined, like
asm-generic/percpu.h does for UP.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>

---
 include/asm-i386/percpu.h |2 ++
 1 file changed, 2 insertions(+)

===
--- a/include/asm-i386/percpu.h
+++ b/include/asm-i386/percpu.h
@@ -34,6 +34,8 @@
 
 /* This is used for other cpus to find our section. */
 extern unsigned long __per_cpu_offset[];
+
+#define per_cpu_offset(x) (__per_cpu_offset[x])
 
 /* Separate out the type, so (int[3], foo) works. */
 #define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 24/28] paravirt: drop unused ptep_get_and_clear

2007-04-14 Thread Jeremy Fitzhardinge

In shadow mode hypervisors, ptep_get_and_clear achieves the desired
purpose of keeping the shadows in sync by issuing a native_get_and_clear,
followed by a call to pte_update, which indicates the PTE has been
modified.

Direct mode hypervisors (Xen) have no need for this anyway, and will trap
the update using writable pagetables.

This means no hypervisor makes use of ptep_get_and_clear; there is no
reason to have it in the paravirt-ops structure.  Change confusing
terminology about raw vs. native functions into consistent use of
native_pte_xxx for operations which do not invoke paravirt-ops.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

---
 arch/i386/kernel/paravirt.c |2 --
 include/asm-i386/paravirt.h |   13 +
 include/asm-i386/pgtable.h  |4 +---
 3 files changed, 2 insertions(+), 17 deletions(-)

===
--- a/arch/i386/kernel/paravirt.c
+++ b/arch/i386/kernel/paravirt.c
@@ -315,8 +315,6 @@ struct paravirt_ops paravirt_ops = {
.pte_update = paravirt_nop,
.pte_update_defer = paravirt_nop,
 
-   .ptep_get_and_clear = native_ptep_get_and_clear,
-
 #ifdef CONFIG_HIGHPTE
.kmap_atomic_pte = kmap_atomic,
 #endif
===
--- a/include/asm-i386/paravirt.h
+++ b/include/asm-i386/paravirt.h
@@ -187,8 +187,6 @@ struct paravirt_ops
void (*pte_update)(struct mm_struct *mm, unsigned long addr, pte_t 
*ptep);
void (*pte_update_defer)(struct mm_struct *mm,
 unsigned long addr, pte_t *ptep);
-
-   pte_t (*ptep_get_and_clear)(pte_t *ptep);
 
 #ifdef CONFIG_HIGHPTE
void *(*kmap_atomic_pte)(struct page *page, enum km_type type);
@@ -859,12 +857,8 @@ static inline void pmd_clear(pmd_t *pmdp
PVOP_VCALL1(pmd_clear, pmdp);
 }
 
-static inline pte_t raw_ptep_get_and_clear(pte_t *p)
-{
-   unsigned long long val = PVOP_CALL1(unsigned long long, 
ptep_get_and_clear, p);
-   return (pte_t) { val, val >> 32 };
-}
 #else  /* !CONFIG_X86_PAE */
+
 static inline pte_t __pte(unsigned long val)
 {
return (pte_t) { PVOP_CALL1(unsigned long, make_pte, val) };
@@ -899,11 +893,6 @@ static inline void set_pmd(pmd_t *pmdp, 
 static inline void set_pmd(pmd_t *pmdp, pmd_t pmdval)
 {
PVOP_VCALL2(set_pmd, pmdp, pmdval.pud.pgd.pgd);
-}
-
-static inline pte_t raw_ptep_get_and_clear(pte_t *p)
-{
-   return (pte_t) { PVOP_CALL1(unsigned long, ptep_get_and_clear, p) };
 }
 #endif /* CONFIG_X86_PAE */
 
===
--- a/include/asm-i386/pgtable.h
+++ b/include/asm-i386/pgtable.h
@@ -265,8 +265,6 @@ static inline pte_t pte_mkhuge(pte_t pte
  */
 #define pte_update(mm, addr, ptep) do { } while (0)
 #define pte_update_defer(mm, addr, ptep)   do { } while (0)
-
-#define raw_ptep_get_and_clear(xp) native_ptep_get_and_clear(xp)
 #endif
 
 /*
@@ -340,7 +338,7 @@ do {
\
 #define __HAVE_ARCH_PTEP_GET_AND_CLEAR
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long 
addr, pte_t *ptep)
 {
-   pte_t pte = raw_ptep_get_and_clear(ptep);
+   pte_t pte = native_ptep_get_and_clear(ptep);
pte_update(mm, addr, ptep);
return pte;
 }

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 07/28] cleanups to help using per-cpu variables from asm

2007-04-14 Thread Jeremy Fitzhardinge

This patch does a few small cleanups:
 - use PER_CPU_NAME to generate the names of per-cpu variables
 - use lea to add the per_cpu offset in PER_CPU(), because it doesn't
   affect condition flags
 - add PER_CPU_VAR which allows direct access to pre-cpu variables
   with the %fs: prefix on SMP.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>

---
 include/asm-i386/percpu.h |   12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

===
--- a/include/asm-i386/percpu.h
+++ b/include/asm-i386/percpu.h
@@ -16,12 +16,14 @@
  *PER_CPU(cpu_gdt_descr, %ebx)
  */
 #ifdef CONFIG_SMP
+#define PER_CPU(var, reg)  \
+   movl %fs:per_cpu__##this_cpu_off, reg;  \
+   lea per_cpu__##var(reg), reg
+#define PER_CPU_VAR(var)   %fs:per_cpu__##var
+#else /* ! SMP */
 #define PER_CPU(var, reg)  \
-   movl %fs:per_cpu__this_cpu_off, reg;\
-   addl $per_cpu__##var, reg
-#else /* ! SMP */
-#define PER_CPU(var, reg) \
-   movl $per_cpu__##var, reg;
+   movl $per_cpu__##var, reg
+#define PER_CPU_VAR(var)   per_cpu__##var
 #endif /* SMP */
 
 #else /* ...!ASSEMBLY */

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 27/28] paravirt: little compile fixes for vmi.c

2007-04-14 Thread Jeremy Fitzhardinge

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Zachary Amsden <[EMAIL PROTECTED]>

---
 arch/i386/kernel/vmi.c |   10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

===
--- a/arch/i386/kernel/vmi.c
+++ b/arch/i386/kernel/vmi.c
@@ -73,10 +73,6 @@ static struct {
void (*set_lazy_mode)(int mode);
 } vmi_ops;
 
-/* XXX move this to alternative.h */
-extern struct paravirt_patch __parainstructions[],
-   __parainstructions_end[];
-
 /* Cached VMI operations */
 struct vmi_timer_ops vmi_timer_ops;
 
@@ -548,9 +544,9 @@ vmi_startup_ipi_hook(int phys_apicid, un
 }
 #endif
 
-static void vmi_set_lazy_mode(int mode)
-{
-   static DEFINE_PER_CPU(int, lazy_mode);
+static void vmi_set_lazy_mode(enum paravirt_lazy_mode mode)
+{
+   static DEFINE_PER_CPU(enum paravirt_lazy_mode, lazy_mode);
 
if (!vmi_ops.set_lazy_mode)
return;

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 00/28] Updates for firstfloor paravirt-ops patches

2007-04-14 Thread Jeremy Fitzhardinge

Hi Andi,

This is a set of updates for the firstfloor patch queue.

Quick rundown:

revert-mm-x86_64-mm-account-for-module-percpu-space-separately-from-kernel-percpu.patch
separate-module-percpu-space.patch
Update the module percpu accounting patch

fix-ff-allow-percpu-variables-to-be-page-aligned.patch
Make sure the percpu memory allocation is page-aligned

deflate-stack-usage-in-lib_inflate_c.patch
Fix deflate stack usage.  With all the arch-fixes rolled in.

i386-gdt-cleanups-page-align-the-gdt.patch
i386-convert-pda-into-the-percpu-section.patch
i386-cleanups-to-help-using-per-cpu-variables-from-asm.patch
percpu-define-per_cpu_offset.patch
fix-uniproc-gdt-bugs.patch
Percpu and GDT fixes.

x86-map-enough-initial-memory.patch
Fix head.S to map enough memory.

cleanup-cleanup-asm-bugs_h.patch
cleanup-identify_cpu-fix.patch
This is the pair I tried to post yesterday, but they
got interrupted by an network outage.  They basically
add a little more cleanup, and move a misplaced hunk.
cleanup-cleanup-asm-bugs_h.patch should go after/roll
into the clean-up-asm-(i386|x86_64)-bugs_h patches, and
cleanup-identify_cpu-fix.patch should go after/roll into
clean-up-identify_cpu patch.

paravirt-flush-on-kunmap_atomic.patch
Flush pending lazy mmu operations on kunmap_atomic too.

paravirt-fix-paravirt_lazy.patch
Fix up an apparent mismerge: remove the #defines for
PARAVIRT_LAZY_* and add _LAZY_FLUSH to the enum.

i386-sysenter-arch-pages-fix.patch
i386-acpi-remove-earlyquirk-warning.patch
i386-mcheck-p4-grotesque-and-needless-warning-fix.patch
i386-pgd-clone-under-lock-fix.patch
paravirt-kmap_atomic_pte-tidy.patch
vmi-supports-compat-vdso.patch
vmi-kmap_atomic_pte-fix.patch
vmi-timer-update.patch
buslogic-check-range-fixes.patch
pte-drop-ptep_get_and_clear-paravirt-op.patch
A chunk of pages from Zach.

rename-the-parainstructions-symbols-to-be-consistent-with-the-others.patch
rename-the-parainstructions-symbols-to-be-consistent-with-the-others-fix.patch
Obvious.

vmi-fix-ff.patch
Make VMI compile in the -ff patchstack.

paravirt-sched-clock-ff.patch
Updated paravirt-sched-lock for your sched_clock.

Thanks,
J

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/28] revert account-for-module-percpu-space-separately-from-kernel-percpu

2007-04-14 Thread Jeremy Fitzhardinge

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

---
 include/asm-i386/percpu.h |   10 --
 1 file changed, 10 deletions(-)

===
--- a/include/asm-i386/percpu.h
+++ b/include/asm-i386/percpu.h
@@ -4,16 +4,6 @@
 #ifndef __ASSEMBLY__
 #include 
 #else
-
-#ifdef CONFIG_MODULES
-# define PERCPU_MODULE_RESERVE 8192
-#else
-# define PERCPU_MODULE_RESERVE 0
-#endif
-
-#define PERCPU_ENOUGH_ROOM \
-   (ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES) + \
-PERCPU_MODULE_RESERVE)
 
 /*
  * PER_CPU finds an address of a per-cpu variable.

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ZFS with Linux: An Open Plea

2007-04-14 Thread hui

On Sat, Apr 14, 2007 at 10:04:23AM -0400, Mike Snitzer wrote:
> ZFS does have some powerful features but much of it depends on their
> broken layering of volume management.  Embedding the equivalent of LVM
> into a filesystem _feels_ quite wrong.

They have a clustering concept in their volume management that isn't
expressable with something like LVM. That justifes their approach from
what I can see.

> That aside, the native snapshot capabilities of ZFS really stand out
> for me.  The redirect on write semantics aren't exclusive to ZFS;
> NetApp's WAFL employs the same.  But with both ZFS and WAFL they were
> designed to do snapshots extremely well from the ground up.

Write allocation for these kinds of system (especially when concerned
with mirroring) is non-trivial.

> Unfortunately in order for Linux to incorporate such a feature I'd
> imagine a new filesystem would need to be developed with redirect on
> write at its core.  Can't really see ext4 or any other existing Linux
> filesystem grafting such a feature into it.  But even though I can't
> see it; do others?

You also can't use the standard page cache to buffer all of the sophicated
semantics of these systems and have to create your own.

> I've learned that Sun and NetApp's lawyers had it out over the
> redirect on write capability of ZFS.  When the dust settled Sun had
> enough patent protection to motivate a truce with NetApp.

I think they are still talking and it's far from over the last I heard.
The creation of a new inode and decending indirect blocks is a fundamental
concept behind WAFL. Also ZFS tends to be a heavy weight as far as
metadata goes and quite possibly uneccessarily so which is likely to effect
performance for things related to keep a relevant block allocation map in
memory. ZFS is a complete pig compared to traditional file systems.

> The interesting side-effect is now ZFS is "open" and with that comes
> redirect on write in a file system other than WAFL.  But ZFS's CDDL
> conflicts with the GPL so I'm not too sure how Linux could hit the
> ground running in this potentially patent mired area of filesystem
> development.  The validity of NetApp having patented redirect on write
> aside; does the conflict between CDDL and GPL _really_ matter?  Or did
> the CDDL release of ZFS somehow undermine NetApp's WAFL patent?

That doesn't really matter. FUSE could be extended to handle this kind
of stuff and still have it be in userspace. The BSD get around including
Stephen Tweedy's (sp?) ext2 header file by making the user manually
compile it. That's not a problem for Linux folks that can download a
patch and compile a kernel.

FreeBSD already has a port of ZFS. Just for a kick, Google for that as
a possible basis for a Linux kernel port.

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc6 + firstfloor patches: BUG: sleeping function called from invalid context at kernel/sched,.c:3643

2007-04-14 Thread Jeremy Fitzhardinge

Andi Kleen wrote:
> Fixed now. The latest sched-clock was leaking preempt counts during
> cpu frequency changes.
>   

No, that didn't help.  I think its cpufreq:

Apr 14 13:58:29 localhost kernel: BUG: scheduling while atomic: 
swapper/0x0002/1
Apr 14 13:58:29 localhost kernel: 2 locks held by swapper/1:
Apr 14 13:58:29 localhost kernel:  #0:  (_cpu(cpu_policy_rwsem, 
cpu)){--..}, at: [] lock_policy_rwsem_write
+0x35/0x5f
Apr 14 13:58:29 localhost kernel:  #1:  (userspace_mutex){--..}, at: 
[] mutex_lock+0x1f/0x23
Apr 14 13:58:29 localhost kernel:  [] show_trace_log_lvl+0x1a/0x30
Apr 14 13:58:29 localhost kernel:  [] show_trace+0x12/0x14
Apr 14 13:58:29 localhost kernel:  [] dump_stack+0x16/0x18
Apr 14 13:58:29 localhost kernel:  [] __sched_text_start+0x79/0x86a
Apr 14 13:58:29 localhost kernel:  [] wait_for_completion+0x74/0xaa
Apr 14 13:58:29 localhost kernel:  [] set_cpus_allowed+0x6e/0x8c
Apr 14 13:58:29 localhost kernel:  [] acpi_cpufreq_target+0x18d/0x262
Apr 14 13:58:29 localhost kernel:  [] 
__cpufreq_driver_target+0x27/0x32
Apr 14 13:58:29 localhost kernel:  [] 
cpufreq_governor_userspace+0x120/0x154
Apr 14 13:58:29 localhost kernel:  [] __cpufreq_governor+0x77/0xab
Apr 14 13:58:29 localhost kernel:  [] __cpufreq_set_policy+0x109/0x11a
Apr 14 13:58:29 localhost kernel:  [] cpufreq_set_policy+0x32/0x6c
Apr 14 13:58:29 localhost kernel:  [] cpufreq_add_dev+0x347/0x3ea
Apr 14 13:58:29 localhost kernel:  [] sysdev_driver_register+0x62/0xaf
Apr 14 13:58:29 localhost kernel:  [] 
cpufreq_register_driver+0x82/0xf2
Apr 14 13:58:29 localhost kernel:  [] acpi_cpufreq_init+0x8d/0x93
Apr 14 13:58:29 localhost kernel:  [] init+0x14b/0x241
Apr 14 13:58:29 localhost kernel:  [] kernel_thread_helper+0x7/0x10
Apr 14 13:58:29 localhost kernel:  ===
Apr 14 13:58:29 localhost kernel: initcall at 0xc0454e97: 
acpi_cpufreq_init+0x0/0x93(): returned with preemption imbalance


J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [CRYPTO] is it really optimized ?

2007-04-14 Thread Francis Moreau

On 4/14/07, Herbert Xu <[EMAIL PROTECTED]> wrote:

Francis Moreau <[EMAIL PROTECTED]> wrote:
>
> hmm yes indeed it should do the job, but I don't see how you do that.
> For example, let say I want to use "aes-foo" with eCryptfs. I can give
> a higher priority to "aes-foo" than "aes" one. When eCryptfs asks for
> a aes cipher it will pass "aes" name and since "aes-foo" has a higher
> priority then the cypto core will return "aes-foo" cipher, right ? But
> in this scheme, eCryptfs has not a higher priority than other kernel
> users. How can I prevent others to use "aes-foo" ?

You would assign "aes-foo" a lower priority and then tell eCryptfs to
use "aes-foo" instead of "aes".

ok but do you think it's safe to assume that no others parts of the
kernel will request "aes-foo" ? Remember that the main point is to
optimize "aes-foo" ?

I would say that it would be better if "aes-foo" could raise a flag
for example indicating to the crypto core that this algo can be
instatiate only one time...

thanks
--
Francis
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6.20.4] BUG: dentry xattrs still in use in shrink_dcache_for_umount() with reiserfs

2007-04-14 Thread Andrea Righi

FYI, this bug occurs also in 2.6.20.7 vanilla...

Honestly I don't know if I'm doing nasty things there, but I tested the
following patch and it seems to fix the problem (at least for my case).

It explicitly invalidates all the dentries in the reiserfs "private" dir
and releases all the valid xattrs references before calling
kill_block_super().

Signed-off-by: Andrea Righi <[EMAIL PROTECTED]>

--- linux-2.6.20.7/include/linux/reiserfs_xattr.h.orig  2007-04-14 
22:00:38.0 +0200
+++ linux-2.6.20.7/include/linux/reiserfs_xattr.h   2007-04-14 
22:12:43.0 +0200
@@ -7,6 +7,9 @@
 /* Magic value in header */
 #define REISERFS_XATTR_MAGIC 0x52465841/* "RFXA" */
 
+#define PRIVROOT_NAME ".reiserfs_priv"
+#define XAROOT_NAME   "xattrs"
+
 struct reiserfs_xattr_header {
__le32 h_magic; /* magic number for identification */
__le32 h_hash;  /* hash of the value */
--- linux-2.6.20.7/fs/reiserfs/xattr.c.orig 2007-04-14 18:53:02.0 
+0200
+++ linux-2.6.20.7/fs/reiserfs/xattr.c  2007-04-14 22:12:43.0 +0200
@@ -48,8 +48,6 @@
 
 #define FL_READONLY 128
 #define FL_DIR_SEM_HELD 256
-#define PRIVROOT_NAME ".reiserfs_priv"
-#define XAROOT_NAME   "xattrs"
 
 static struct reiserfs_xattr_handler *find_xattr_handler_prefix(const char
*prefix);
--- linux-2.6.20.7/fs/reiserfs/super.c.orig 2007-04-14 18:53:06.0 
+0200
+++ linux-2.6.20.7/fs/reiserfs/super.c  2007-04-14 22:47:06.0 +0200
@@ -432,17 +432,30 @@ int remove_save_link(struct inode *inode
 
 static void reiserfs_kill_sb(struct super_block *s)
 {
+   struct dentry *priv;
+
if (REISERFS_SB(s)) {
-   if (REISERFS_SB(s)->xattr_root) {
-   d_invalidate(REISERFS_SB(s)->xattr_root);
-   dput(REISERFS_SB(s)->xattr_root);
-   REISERFS_SB(s)->xattr_root = NULL;
-   }
+   priv = REISERFS_SB(s)->priv_root;
+   if (priv) {
+   struct dentry *loop, *tmp;
 
-   if (REISERFS_SB(s)->priv_root) {
-   d_invalidate(REISERFS_SB(s)->priv_root);
-   dput(REISERFS_SB(s)->priv_root);
REISERFS_SB(s)->priv_root = NULL;
+#ifdef CONFIG_REISERFS_FS_XATTR
+   REISERFS_SB(s)->xattr_root = NULL;
+
+   list_for_each_entry_safe(loop, tmp,
+>d_subdirs,
+d_u.d_child) {
+   d_invalidate(loop);
+   if (!strcmp(loop->d_name.name, XAROOT_NAME)) {
+   if (loop->d_inode) {
+   dput(loop);
+   }
+   }
+   }
+#endif
+   d_invalidate(priv);
+   dput(priv);
}
}
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ZFS with Linux: An Open Plea

2007-04-14 Thread Jan Engelhardt


On Apr 14 2007 10:04, Mike Snitzer wrote:
>
> ZFS does have some powerful features but much of it depends on their
> broken layering of volume management.  Embedding the equivalent of LVM
> into a filesystem _feels_ quite wrong.
>
>[...]
>
> Unfortunately in order for Linux to incorporate such a feature I'd
> imagine a new filesystem would need to be developed with redirect on
> write at its core.  Can't really see ext4 or any other existing Linux
> filesystem grafting such a feature into it.  But even though I can't
> see it; do others?

FUSE-based filesystems. (That said, might wanna try through
"ZFS-on-fuse" fs).


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.21-rc6

2007-04-14 Thread Rafael J. Wysocki

On Saturday, 14 April 2007 22:25, Adrian Bunk wrote:
> On Sat, Apr 14, 2007 at 10:23:31PM +0200, Rafael J. Wysocki wrote:
> >...
> > Also, would that be feasible for you to use 'shutdown' as a workaround in 
> > case
> > the source of the problem is difficult to find and/or fix?
> 
> One person reporting a regression against a -rc kernel can mean
> houndreds or thousands of people who will run into the same issue after 
> 2.6.21 got released if a manual workaround is required...

Well, in this particular case it is not very likely to happen.  I have three
x86_64 machines here with totally different chipsets/devices on which I'm
not seeing anything like that and I believe we'd have more reports before
if that were a common issue.

That said, I'm not going to ignore it.  I'll do my best to debug and fix it, if
Tobias helps me. :-)

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.21-rc6

2007-04-14 Thread Adrian Bunk

On Sat, Apr 14, 2007 at 10:23:31PM +0200, Rafael J. Wysocki wrote:
>...
> Also, would that be feasible for you to use 'shutdown' as a workaround in case
> the source of the problem is difficult to find and/or fix?

One person reporting a regression against a -rc kernel can mean
houndreds or thousands of people who will run into the same issue after 
2.6.21 got released if a manual workaround is required...

> Rafael

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/1] Char: mxser_new, fix recursive locking

2007-04-14 Thread Jan Yenya Kasprzak

Jiri Slaby wrote:
: On 4/14/07, Jan Yenya Kasprzak <[EMAIL PROTECTED]> wrote:
: >Jiri Slaby wrote:
: >: On 4/14/07, Jan Yenya Kasprzak <[EMAIL PROTECTED]> wrote:
: >: >~BUG: spinlock lockup on CPU#0, sshd/1671, 80557780
: >: [...]
: >: >the write(1, "/file/name/...", ...) call returned -EIO.
: >:
: >: Just a question: both with mxser_new, right?
: >
: >No. One side has a multiport C168H with mxser_new, and the other
: 
: I meant both lockup and EIO error -- I guess so from this line.

Yes.

-Yenya


-- 
| Jan "Yenya" Kasprzak   |
| GPG: ID 1024/D3498839  Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/Journal: http://www.fi.muni.cz/~kas/blog/ |
> I will never go to meetings again because I think  face to face meetings <
> are the biggest waste of time you can ever have.--Linus Torvalds <
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/1] Char: mxser_new, fix recursive locking

2007-04-14 Thread Jiri Slaby


On 4/14/07, Jan Yenya Kasprzak <[EMAIL PROTECTED]> wrote:

Jiri Slaby wrote:
: On 4/14/07, Jan Yenya Kasprzak <[EMAIL PROTECTED]> wrote:
: >~BUG: spinlock lockup on CPU#0, sshd/1671, 80557780
: [...]
: >the write(1, "/file/name/...", ...) call returned -EIO.
:
: Just a question: both with mxser_new, right?

No. One side has a multiport C168H with mxser_new, and the other


I meant both lockup and EIO error -- I guess so from this line.

regards,
--
http://www.fi.muni.cz/~xslaby/Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/1] Char: mxser_new, fix recursive locking

2007-04-14 Thread Jan Yenya Kasprzak

Jiri Slaby wrote:
: On 4/14/07, Jan Yenya Kasprzak <[EMAIL PROTECTED]> wrote:
: >~BUG: spinlock lockup on CPU#0, sshd/1671, 80557780
: [...]
: >the write(1, "/file/name/...", ...) call returned -EIO.
: 
: Just a question: both with mxser_new, right?

No. One side has a multiport C168H with mxser_new, and the other
one (the one from which the above strace is) is an ordinary server with
console on ttyS0. So the above write(1,...) has fd#1 on ttyS0.

-Yenya

-- 
| Jan "Yenya" Kasprzak   |
| GPG: ID 1024/D3498839  Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/Journal: http://www.fi.muni.cz/~kas/blog/ |
> I will never go to meetings again because I think  face to face meetings <
> are the biggest waste of time you can ever have.--Linus Torvalds <
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.21-rc6

2007-04-14 Thread Rafael J. Wysocki

On Saturday, 14 April 2007 21:56, Tobias Diedrich wrote:
> Rafael J. Wysocki wrote:
> > On Saturday, 14 April 2007 15:00, Adrian Bunk wrote:
> > > On Sat, Apr 14, 2007 at 02:31:54PM +0200, Tobias Diedrich wrote:
> > > > Tobias Diedrich wrote:
> > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 is first bad commit
> > > > > > commit ed746e3b18f4df18afa3763155972c5835f284c5
> > > > > > Author: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > > > > > Date:   Sat Feb 10 01:43:32 2007 -0800
> > > > > > 
> > > > > > [PATCH] swsusp: Change code ordering in disk.c
> > > > > > 
> > > > > > Change the ordering of code in kernel/power/disk.c so that 
> > > > > > device_suspend() is
> > > > > > called before disable_nonboot_cpus() and platform_finish() is 
> > > > > > called after
> > > > > > enable_nonboot_cpus() and before device_resume(), as indicated 
> > > > > > by the recent
> > > > > > discussion on Linux-PM (cf.
> > > > > > 
> > > > > > http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html).
> > > > > > 
> > > > > > The changes here only affect the built-in swsusp.
> > > > > > 
> > > > > > [EMAIL PROTECTED]: fix LED blinking during image load]
> > > > > > Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > > > > > Acked-by: Pavel Machek <[EMAIL PROTECTED]>
> > > > > > Cc: Greg KH <[EMAIL PROTECTED]>
> > > > > > Cc: Nigel Cunningham <[EMAIL PROTECTED]>
> > > > > > Cc: Patrick Mochel <[EMAIL PROTECTED]>
> > > > > > Cc: Alexey Starikovskiy <[EMAIL PROTECTED]>
> > > > > > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> > > > > > Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
> > > > > > 
> > > > > > :04 04 7eca5b3a8f9606bc4f2ff41192ec8c9d4ca90d18 
> > > > > > 8313b674e1d1bdf6849350af06d28a89b3bb3054 M  kernel
> > > > > > 
> > > > > > 
> > > > > > Now, the remaining test is to try reverting this commit from -rc6. 
> > > > > > :)
> > > > > 
> > > > > Doesn't apply cleanly against -rc6, but fixes the problem when
> > > > > reverted from -rc1.
> > > > 
> > > > Now, this was already reported in
> > > > http://lkml.org/lkml/2007/3/16/126
> > > > and I even flagged that message in my local folder, but apparently 
> > > > forgot
> > > > to follow up on it... *sigh*
> > > 
> > > Unless I misunderstood something, all of the problems Maxim described in 
> > > this email are fixed for him in -rc6.
> > > 
> > > But it's quite possible that you are running into a different issue 
> > > exposed by this commit.
> > 
> > Yes, it's likely.
> > 
> > Tobias, I'm unable to reproduce the problem with your .config, but my 
> > hardware
> > is certainly different.  Which suspend mode do you use?  If that's 
> > "platform",
> > can you try to use "shutdown" or "reboot" and see if that helps?
> 
> Sure.
> shutdown/reboot works fine, only platform is broken.

Thanks.

Now, I suspect the problem is somehow related to the hardware, so it would help
a lot if we could identify the piece of hardware (or driver) involved.

AFAICT, your system is a non-SMP one, so we can rule out
disable/enable_nonboot_cpus().  To confirm that the problem is related to
platform_finish(), can you please apply the appended debug patch and
see if the suspend in the 'platform' mode works with it?

Also, would that be feasible for you to use 'shutdown' as a workaround in case
the source of the problem is difficult to find and/or fix?

Rafael

---
 kernel/power/disk.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.21-rc6/kernel/power/disk.c
===
--- linux-2.6.21-rc6.orig/kernel/power/disk.c
+++ linux-2.6.21-rc6/kernel/power/disk.c
@@ -170,8 +170,8 @@ int pm_suspend_disk(void)
 
if (in_suspend) {
enable_nonboot_cpus();
-   platform_finish();
device_resume();
+   platform_finish();
resume_console();
pr_debug("PM: writing image.\n");
error = swsusp_write();
@@ -189,8 +189,8 @@ int pm_suspend_disk(void)
  Enable_cpus:
enable_nonboot_cpus();
  Resume_devices:
-   platform_finish();
device_resume();
+   platform_finish();
resume_console();
  Thaw:
unprepare_processes();


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-14 Thread Willy Tarreau

On Sat, Apr 14, 2007 at 12:48:55PM -0700, William Lee Irwin III wrote:
> On Sat, Apr 14, 2007 at 10:36:25AM +0200, Willy Tarreau wrote:
> > Forking becomes very slow above a load of 100 it seems. Sometimes,
> > the shell takes 2 or 3 seconds to return to prompt after I run
> > "scheddos &"
> > Those are very promising results, I nearly observe the same responsiveness
> > as I had on a solaris 10 with 10k running processes on a bigger machine.
> > I would be curious what a mysql test result would look like now.
> 
> Where is scheddos?

I will send it to you off-list. I've been avoiding to publish it for a long
time because the stock scheduler was *very* sensible to trivial attacks
(freezes larger than 30s, impossible to log in). It's very basic, and I
have no problem sending it to anyone who requests it, it's just that as
long as some distros ship early 2.6 kernels I do not want it to appear on
mailing list archives for anyone to grab it and annoy their admins for free.

Cheers,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/1] Char: mxser_new, fix recursive locking

2007-04-14 Thread Jiri Slaby


On 4/14/07, Jan Yenya Kasprzak <[EMAIL PROTECTED]> wrote:

~BUG: spinlock lockup on CPU#0, sshd/1671, 80557780

[...]

the write(1, "/file/name/...", ...) call returned -EIO.


Just a question: both with mxser_new, right?

regards,
--
http://www.fi.muni.cz/~xslaby/Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [TEST RESULT]massive_intr.c -- cfs/vanilla/sd-0.40

2007-04-14 Thread William Lee Irwin III

On Sat, Apr 14, 2007 at 02:02:20PM +0200, Ingo Molnar wrote:
> cool. ringtest.c is intended to be used the following way: start it, it 
> will generate a 99% busy system (but it is using a ring of 100 tasks, 
> where each tasks runs for 100 msecs then sleeps for 1 msec, so every 
> task gets a turn every 10 seconds). If you add a pure CPU hog to the 
> system, for example an infinite shell loop:
>   while :; do :; done &
> then a 'fair' scheduler would give roughly 50% of CPU time to the CPU 
> hog (and the ringtest.c tasks take up the other 50%).

I've queued up modifying ringtest.c to automatically spawn a CPU hog
and then report aggregate CPU bandwidth of the ring and the bandwidth
of the CPU hog as work to do at some point. I've no guarantee I'll get
to it in a timely fashion, though.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.21-rc6

2007-04-14 Thread Tobias Diedrich

Rafael J. Wysocki wrote:
> On Saturday, 14 April 2007 15:00, Adrian Bunk wrote:
> > On Sat, Apr 14, 2007 at 02:31:54PM +0200, Tobias Diedrich wrote:
> > > Tobias Diedrich wrote:
> > > > > ed746e3b18f4df18afa3763155972c5835f284c5 is first bad commit
> > > > > commit ed746e3b18f4df18afa3763155972c5835f284c5
> > > > > Author: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > > > > Date:   Sat Feb 10 01:43:32 2007 -0800
> > > > > 
> > > > > [PATCH] swsusp: Change code ordering in disk.c
> > > > > 
> > > > > Change the ordering of code in kernel/power/disk.c so that 
> > > > > device_suspend() is
> > > > > called before disable_nonboot_cpus() and platform_finish() is 
> > > > > called after
> > > > > enable_nonboot_cpus() and before device_resume(), as indicated by 
> > > > > the recent
> > > > > discussion on Linux-PM (cf.
> > > > > 
> > > > > http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html).
> > > > > 
> > > > > The changes here only affect the built-in swsusp.
> > > > > 
> > > > > [EMAIL PROTECTED]: fix LED blinking during image load]
> > > > > Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > > > > Acked-by: Pavel Machek <[EMAIL PROTECTED]>
> > > > > Cc: Greg KH <[EMAIL PROTECTED]>
> > > > > Cc: Nigel Cunningham <[EMAIL PROTECTED]>
> > > > > Cc: Patrick Mochel <[EMAIL PROTECTED]>
> > > > > Cc: Alexey Starikovskiy <[EMAIL PROTECTED]>
> > > > > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> > > > > Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
> > > > > 
> > > > > :04 04 7eca5b3a8f9606bc4f2ff41192ec8c9d4ca90d18 
> > > > > 8313b674e1d1bdf6849350af06d28a89b3bb3054 M  kernel
> > > > > 
> > > > > 
> > > > > Now, the remaining test is to try reverting this commit from -rc6. :)
> > > > 
> > > > Doesn't apply cleanly against -rc6, but fixes the problem when
> > > > reverted from -rc1.
> > > 
> > > Now, this was already reported in
> > > http://lkml.org/lkml/2007/3/16/126
> > > and I even flagged that message in my local folder, but apparently forgot
> > > to follow up on it... *sigh*
> > 
> > Unless I misunderstood something, all of the problems Maxim described in 
> > this email are fixed for him in -rc6.
> > 
> > But it's quite possible that you are running into a different issue 
> > exposed by this commit.
> 
> Yes, it's likely.
> 
> Tobias, I'm unable to reproduce the problem with your .config, but my hardware
> is certainly different.  Which suspend mode do you use?  If that's "platform",
> can you try to use "shutdown" or "reboot" and see if that helps?

Sure.
shutdown/reboot works fine, only platform is broken.

-- 
Tobias  PGP: http://9ac7e0bc.uguu.de
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/1] Char: mxser_new, fix recursive locking

2007-04-14 Thread Jan Yenya Kasprzak

Jiri Slaby wrote:
: >I have another problem with the driver - it probably sometimes
: >drops DCD signal on the serial line or something like that:
: >when the traffic on the serial console is heavy, it sometimes disconnects
: >me from the remote shell, and cu(1) displays the login prompt from the new
: >instance of mgetty of the remote machine. However, it does so both with
: >mxser.o and mxser_new.o (in 2.6.21-rc6, I think it worked in 2.6.19,
: >but I have to retest it). So this is another problem, different from
: >the one we are trying to solve now.
: 
: There were some changes, however nothing significant in mxser.c, maybe
: some of tty or ldisc layer changes (but there is only termios ->
: ktermios switch + some other things), this would probably be hard to
: find without bisecting if 2.6.19 is really OK for you.

OK, I'll try to bisect in a few weeks.

: The only idea I have right now is to nohup process, which will
: int fd = open("/dev/ttyMIXX", O_RDONLY | O_NONBLOCK);
: while (1) {
:  ioctl(fd, TIOCMIWAIT, TIOCM_CD);
:  ioctl(fd, TIOCMGET, );
:  printf("%ld: carrier has changed: %u\n", time(NULL), !!(ret & TIOCM_CD));
: }

Hmm, I have tried to run this, and got a machine lockup, and after
a minute or so the following has been printed to the console:

~BUG: spinlock lockup on CPU#0, sshd/1671, 80557780

(I was logged in using ssh, and ran the above code inside the ssh session).

I have tried to look at it from the remote side as well:
stracing the "find / -print" revealed that after some time,
the write(1, "/file/name/...", ...) call returned -EIO.
And from this point on, all subsequent writes to fd #1 did return
-EIO as well.

-Yenya

-- 
| Jan "Yenya" Kasprzak   |
| GPG: ID 1024/D3498839  Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/Journal: http://www.fi.muni.cz/~kas/blog/ |
> I will never go to meetings again because I think  face to face meetings <
> are the biggest waste of time you can ever have.--Linus Torvalds <
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-14 Thread William Lee Irwin III

On Sat, Apr 14, 2007 at 10:36:25AM +0200, Willy Tarreau wrote:
> Forking becomes very slow above a load of 100 it seems. Sometimes,
> the shell takes 2 or 3 seconds to return to prompt after I run
> "scheddos &"
> Those are very promising results, I nearly observe the same responsiveness
> as I had on a solaris 10 with 10k running processes on a bigger machine.
> I would be curious what a mysql test result would look like now.

Where is scheddos?


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] kthread: Enhance kthread_stop to abort interruptible sleeps

2007-04-14 Thread Oleg Nesterov

On 04/14, Eric W. Biederman wrote:
>
> Oleg Nesterov <[EMAIL PROTECTED]> writes:
> 
> > On 04/13, Eric W. Biederman wrote:
> >>
> >> +static inline int __kthread_should_stop(struct task_struct *tsk)
> >> +{
> >> +  return test_tsk_thread_flag(tsk, TIF_KTHREAD_STOP);
> >> +}
> >
> > Am I blind? Where does copy_process/dup_task_struct clears unwanted
> > flags in thread_info->flags ?
> 
> Good question.  It is only a real problem if someone forks a kernel
> thread after we ask it to die but, it does appear to be an issue.
> With this usage and the same usage by the process freezer.
> 
> We do have these lines in copy_process...
> 
>   clear_tsk_thread_flag(p, TIF_SIGPENDING);
>   init_sigpending(>pending);
> 
> I don't know what we want to do about TIF_KTHREAD_STOP and TIF_FREEZE.

Perhaps we need _TIF_CLEAR_ON_FORK_MASK. Probably doesn't matter right
now, but still it is not imho safe in general.

> Right now we will go allow our merry way until we hit:
> 
>   recalc_sigpending();
>   if (signal_pending(current)) {
>   spin_unlock(>sighand->siglock);
>   write_unlock_irq(_lock);
>   retval = -ERESTARTNOINTR;
>   goto bad_fork_cleanup_namespaces;
>   }
> 
> And copy_process will fail.  Since that is an expected failure point
> that actually seems like reasonable behavior in this case if you
> are being frozen or are being told to die you can't fork.
> 
> It does ensure that these additional kernel flags won't make it
> onto new instances of struct task_struct.  Which is the important
> thing from a correctness standpoint.

Note that we set TIF_FREEZE and TIF_KTHREAD_STOP outside of ->siglock,
so both flags can leak onto the child. Again, not a problem right now.
TIF_KTHREAD_STOP doesn't matter unless process was created vi kthread_create(),
but in that case it can't inherit TIF_KTHREAD_STOP.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [CRYPTO] is it really optimized ?

2007-04-14 Thread Herbert Xu

Francis Moreau <[EMAIL PROTECTED]> wrote:
> 
> hmm yes indeed it should do the job, but I don't see how you do that.
> For example, let say I want to use "aes-foo" with eCryptfs. I can give
> a higher priority to "aes-foo" than "aes" one. When eCryptfs asks for
> a aes cipher it will pass "aes" name and since "aes-foo" has a higher
> priority then the cypto core will return "aes-foo" cipher, right ? But
> in this scheme, eCryptfs has not a higher priority than other kernel
> users. How can I prevent others to use "aes-foo" ?

You would assign "aes-foo" a lower priority and then tell eCryptfs to
use "aes-foo" instead of "aes".

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: GIT and the current -stable

2007-04-14 Thread Julian Phillips


On Sat, 14 Apr 2007, Chris Wright wrote:


* Brian Gernhardt ([EMAIL PROTECTED]) wrote:

On Apr 14, 2007, at 4:34 AM, Chris Wright wrote:

I've already put a tree like this up on kernel.org.  The master branch
is Linus' tree, and there's branches for each of the stable releases
called linux-2.6.[12-20].y (I didn't add 2.6.11.y).

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6-stable.git;a=summary


Is HEAD for that repo the most recent stable branch, or (as gitweb
makes it look) Linus's head.  I'd expect a "-stable" repo to point at
the most recent stable commit, not the most recent development
commit.  And I'd also expect gitweb's summary page to show the
shortlog for HEAd.  One of my assumptions are being broken and I
don't like it.  It leaves me all confused...


As I mentioned.  The master branch (HEAD) is Linus' tree, and each
stable tree is on its own branch.  You'll find shortlog summarizes the
main branch, so yes, gitweb's summary is a bit confusing based on your
assumptions.  This is a new tree and hasn't been publicized until now.
It does make sense to have its head be the newest stable, I'll switch
that around.


Would it not make more sense to point HEAD at the linux-2.6.20-y branch 
and either let master be Linus' tree or simply not have a master branch? 
Otherwise, what happens to master when the latest stable tree becomes 
linux-2.6.21-y?


--
Julian

 ---
Most people want either less corruption or more of a chance to
participate in it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-14 Thread Willy Tarreau

On Sat, Apr 14, 2007 at 12:40:15PM -0600, Eric W. Biederman wrote:
> Willy Tarreau <[EMAIL PROTECTED]> writes:
> 
> > On Sat, Apr 14, 2007 at 07:54:33PM +0200, Ingo Molnar wrote:
> >> 
> >> * Eric W. Biederman <[EMAIL PROTECTED]> wrote:
> >> 
> >> > > Thinking about it, I don't know if there are calls to schedule() 
> >> > > while switching from tty1 to tty2. Alt-F2 had no effect anymore, and 
> >> > > "chvt 2" simply blocked. It would have been possible that a 
> >> > > schedule() call somewhere got starved due to the load, I don't know.
> >> > 
> >> > It looks like there is a call to schedule_work.
> >> 
> >> so this goes over keventd, right?
> >> 
> >> > There are two pieces of the path. If you are switching in and out of a 
> >> > tty controlled by something like X.  User space has to grant 
> >> > permission before the operation happens.  Where there isn't a gate 
> >> > keeper I know it is cheaper but I don't know by how much, I suspect 
> >> > there is still a schedule happening in there.
> >> 
> >> Could keventd perhaps be starved? Willy, to exclude this possibility, 
> >> could you perhaps chrt keventd to RT priority? If events/0 is PID 5 then 
> >> the command to set it to SCHED_FIFO:50 would be:
> >> 
> >>   chrt -f -p 50 5
> >> 
> >> but ... events/0 is reniced to -5 by default, so it should definitely 
> >> not be starved.
> >
> > Well, since I merged the fair-fork patch, I cannot reproduce (in fact,
> > bash forks 1000 processes, then progressively execs scheddos, but it
> > takes some time). So I'm rebuilding right now. But I think that Linus
> > has an interesting clue about GPM and notification before switching
> > the terminal. I think it was enabled in console mode. I don't know
> > how that translates to frozen xterms, but let's attack the problems
> > one at a time.
> 
> I think it is a good clue.  However the intention of the mechanism is
> that only processes that change the video mode on a VT are supposed to
> use it.  So I really don't think gpm is the culprit.  However it easily could
> be something else that has similar characteristics.
> 
> I just realized we do have proof that schedule_work is actually working
> because SAK works, and we can't sanely do SAK from interrupt context
> so we call schedule work.

Eric,

I can say that Linus, Ingo and you all got on the right track.
I could reproduce, I got a hung tty around 1400 running processes.
Fortunately, it was the one with the root shell which was reniced
to -19.

I could strace chvt 2 :

20:44:23.761117 open("/dev/tty", O_RDONLY) = 3 <0.004000>
20:44:23.765117 ioctl(3, KDGKBTYPE, 0xbfa305a3) = 0 <0.024002>
20:44:23.789119 ioctl(3, VIDIOC_G_COMP or VT_ACTIVATE, 0x3) = 0 <0.00>
20:44:23.789119 ioctl(3, VIDIOC_S_COMP or VT_WAITACTIVE 

Then I applied Ingo's suggestion about changing keventd prio :

[EMAIL PROTECTED]:~# ps auxw|grep event
root 8  0.0  0.0 00 ?SW<  20:31   0:00 [events/0]
root 9  0.0  0.0 00 ?RW<  20:31   0:00 [events/1]

[EMAIL PROTECTED]:~# rtprio -s 1 -p 50 8 9 (I don't have chrt but it does 
the same)

My VT immediately switched as soon as I hit Enter. Everything's
working fine again now. So the good news is that it's not a bug
in the tty code, nor a deadlock.

Now, maybe keventd should get a higher prio ? It seems worrying to
me that it may starve when it seems so much sensible.

Also, that may explain why I couldn't reproduce with the fork patch.
Since all new processes got no runtime at first, their impact on
existing ones must have been lower. But I think that if I had waited
longer, I would have had the problem again (though I did not see it
even under a load of 7800).

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] kthread: Enhance kthread_stop to abort interruptible sleeps

2007-04-14 Thread Eric W. Biederman

Oleg Nesterov <[EMAIL PROTECTED]> writes:

> On 04/13, Eric W. Biederman wrote:
>>
>> +static inline int __kthread_should_stop(struct task_struct *tsk)
>> +{
>> +return test_tsk_thread_flag(tsk, TIF_KTHREAD_STOP);
>> +}
>
> Am I blind? Where does copy_process/dup_task_struct clears unwanted
> flags in thread_info->flags ?

Good question.  It is only a real problem if someone forks a kernel
thread after we ask it to die but, it does appear to be an issue.
With this usage and the same usage by the process freezer.

We do have these lines in copy_process...

clear_tsk_thread_flag(p, TIF_SIGPENDING);
init_sigpending(>pending);

I don't know what we want to do about TIF_KTHREAD_STOP and TIF_FREEZE.

Right now we will go allow our merry way until we hit:

recalc_sigpending();
if (signal_pending(current)) {
spin_unlock(>sighand->siglock);
write_unlock_irq(_lock);
retval = -ERESTARTNOINTR;
goto bad_fork_cleanup_namespaces;
}

And copy_process will fail.  Since that is an expected failure point
that actually seems like reasonable behavior in this case if you
are being frozen or are being told to die you can't fork.

It does ensure that these additional kernel flags won't make it
onto new instances of struct task_struct.  Which is the important
thing from a correctness standpoint.

>> +int kthread_stop(struct task_struct *tsk)
>>  {
>>  int ret;
>>  
>> -mutex_lock(_stop_lock);
>> -
>> -/* It could exit after stop_info.k set, but before wake_up_process. */
>> -get_task_struct(k);
>> +/* Ensure the task struct persists until I read the exit code. */
>> +get_task_struct(tsk);
>>  
>> -/* Must init completion *before* thread sees kthread_stop_info.k */
>> -init_completion(_stop_info.done);
>> -smp_wmb();
>> +set_tsk_thread_flag(tsk, TIF_KTHREAD_STOP);
>> +spin_lock_irq(>sighand->siglock);
>> +signal_wake_up(tsk, 1);
>> +spin_unlock_irq(>sighand->siglock);
>
> Off-topic again, the comment above signal_wake_up() is very confusing...
> I think the only reason for ->siglock is to prevent the case when
> TIF_SIGPENDING may be cleared by recalc_sigpending(). Or something else?

I'm really not certain.  I just looked and saw that every user of
signal_pending had the siglock, so I didn't delve any deeper.

> This changes the current API, kthread_stop() doesn't wake up UNINTERRUPTIBLE
> tasks any longer. I'd say this is good, but may break things ... For example,
> kthred_stop(kthread_create(...)) can't work now.

Ugh. Good point.  I haven't picked up the UNINTERRUPTIBLE change yet.
I hadn't realized this exceeded the wait for the completion.  Which
it obviously does, ugh.  

I don't know what to do about the theoretical freezer race, for now I
am inclined to ignore it.  At least until someone audits all of the kernel
threads to be certain an API change is ok.

There is also Andrews general objection that we should use UNINTERRUPTIBLE
sleeps very sparingly because it contributes to load average and such.

>> -/* Now set kthread_should_stop() to true, and wake it up. */
>> -kthread_stop_info.k = k;
>> -wake_up_process(k);
>> -put_task_struct(k);
>> +wait_for_completion(tsk->vfork_done);
>> +ret = tsk->exit_code;
>
> This is really good. Now the kernel thread can exit() at any point without
> fear to break kthread_stop().

Yes.

>> @@ -218,7 +219,7 @@ static inline int has_pending_signals(sigset_t *signal,
> sigset_t *blocked)
>>  fastcall void recalc_sigpending_tsk(struct task_struct *t)
>>  {
>>  if (t->signal->group_stop_count > 0 ||
>> -(freezing(t)) ||
>> +(freezing(t)) || __kthread_should_stop(t) ||
>>  PENDING(>pending, >blocked) ||
>>  PENDING(>signal->shared_pending, >blocked))
>>  set_tsk_thread_flag(t, TIF_SIGPENDING);
>
> Aha, now I understand your point about interruptible sleeps (the previous
> message). What is the reason for this change?

This bit is so that TIF_SIGPENDING does not get cleared on us.

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] make kthread_stop() scalable

2007-04-14 Thread Oleg Nesterov

On 04/14, Eric W. Biederman wrote:
>
> This is where I was going beyond what you were doing.  I needed a flag to say
> that this a kthread that is stopping to test in recalc_sigpending.  To be 
> certain
> of terminating interruptible sleeps.  I could not get at your struct kthread
> in that case.
> 
> If it wasn't for the wait_event_interruptible thing I likely would
> have just thrown a union in struct task_struct.
> 
> I also got lucky in that vfork_done is designed to point a completion
> just where I need it (when a task exits).  The name is now a little
> abused but otherwise it does just what I want it to.
> 
> >> It also doesn't solve the biggest problem with the current kthread 
> >> interface
> >> in that calling kthread_stop does not cause the code to break out of
> >> interruptible sleeps.
> >
> > Hm? kthread_stop() does wake_up_process(), it wakes up TASK_INTERRUPTIBLE 
> > tasks.
> 
> Yes. But if they are looping, unless signal_pending is set it is quite 
> possible
> they will go back to sleep.
> 
> Take for example:
> 
> > #define __wait_event_interruptible(wq, condition, ret)  \
> > do {
> > \
> > DEFINE_WAIT(__wait);\
> > \
> > for (;;) {  \
> > prepare_to_wait(, &__wait, TASK_INTERRUPTIBLE);  \
> > if (condition)  \
> > break;  \
> > if (!signal_pending(current)) { \
> > schedule(); \
> > continue;   \
> > }   \
> > ret = -ERESTARTSYS; \
> > break;  \
> > }   \
> > finish_wait(, &__wait);  \
> > } while (0)
> 
> We don't break out until either condition is true or signal_pending(current)
> is true.
> 
> Loops that do that are very common in the kernel.  I counted about 500
> calls of signal pending in places that otherwise care nothing about signals.
> Several kernel threads call into functions that use loops like
> wait_event_interruptible.  So I need a more forceful kthread_stop.  If
> I don't want to continue to use signals.

Yes, I got it reading your next patches. Ok, probably this change is good.
My question was: do we really want to force a kernel thread to exit if it
waits for event in TASK_INTERRUPTIBLE state? probably yes.

> > Yes, thanks... Can't understand how I was s stupid!!! thanks...
> >
> > Damn. We don't need 2 completions! just one.
> 
> Yep.  My second patch in this last round implements that.

Yes, I have read it. It is clearly better then mine, and I think correct.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-14 Thread hui

On Sat, Apr 14, 2007 at 01:18:09AM +0200, Ingo Molnar wrote:
> very much so! Both Con and Mike has contributed regularly to upstream 
> sched.c:

The problem here is tha Con can get demotivated (and rather upset) when an
idea gets proposed, like SchedPlug, only to have people be hostile to it
and then sudden turn around an adopt this idea. It give the impression
that you, in this specific case, were more interested in controlling a
situation and the track of development instead of actually being inclusive
of the development process with discussion and serious consideration, etc...

This is how the Linux community can be perceived as elitist. The old guard
would serve the community better if people were more mindful and sensitive
to developer issues. There was a particular speech that I was turned off by
at OLS 2006 that pretty much pandering to the "old guard's" needs over
newer developers. Since I'm a some what established engineer in -rt (being
the only other person that mapped the lock hierarchy out for full
preemptibility), I had the confidence to pretty much ignored it while
previously this could have really upset me and be highly discouraging to
a relatively new developer.

As Linux gets larger and larger this is going to be an increasing problem
when folks come into the community with new ideas and the community will
need to change if it intends to integrate these folks. IMO, a lot of
these flame ware wouldn't need to exist if folks listent ot each other
better and permit co-ownership of code like the scheduler since it needs
multipule hands in it adapt to new loads and situations, etc...

I'm saying this nicely now since I can be nasty about it.

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: GIT and the current -stable

2007-04-14 Thread Gerb Stralko


Please don't do this. Using the same name for a branch as for a tag is
madness. Call it "v2.6.20-stable" or anything else, but don't re-use the
same naming as for tags.



Yes I have done this before, and it took me awhile to realize what was
going on.  It caused me some grief, and a few hours of lost time...Of
course this was back in my earlier days ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-14 Thread Eric W. Biederman

Willy Tarreau <[EMAIL PROTECTED]> writes:

> On Sat, Apr 14, 2007 at 07:54:33PM +0200, Ingo Molnar wrote:
>> 
>> * Eric W. Biederman <[EMAIL PROTECTED]> wrote:
>> 
>> > > Thinking about it, I don't know if there are calls to schedule() 
>> > > while switching from tty1 to tty2. Alt-F2 had no effect anymore, and 
>> > > "chvt 2" simply blocked. It would have been possible that a 
>> > > schedule() call somewhere got starved due to the load, I don't know.
>> > 
>> > It looks like there is a call to schedule_work.
>> 
>> so this goes over keventd, right?
>> 
>> > There are two pieces of the path. If you are switching in and out of a 
>> > tty controlled by something like X.  User space has to grant 
>> > permission before the operation happens.  Where there isn't a gate 
>> > keeper I know it is cheaper but I don't know by how much, I suspect 
>> > there is still a schedule happening in there.
>> 
>> Could keventd perhaps be starved? Willy, to exclude this possibility, 
>> could you perhaps chrt keventd to RT priority? If events/0 is PID 5 then 
>> the command to set it to SCHED_FIFO:50 would be:
>> 
>>   chrt -f -p 50 5
>> 
>> but ... events/0 is reniced to -5 by default, so it should definitely 
>> not be starved.
>
> Well, since I merged the fair-fork patch, I cannot reproduce (in fact,
> bash forks 1000 processes, then progressively execs scheddos, but it
> takes some time). So I'm rebuilding right now. But I think that Linus
> has an interesting clue about GPM and notification before switching
> the terminal. I think it was enabled in console mode. I don't know
> how that translates to frozen xterms, but let's attack the problems
> one at a time.

I think it is a good clue.  However the intention of the mechanism is
that only processes that change the video mode on a VT are supposed to
use it.  So I really don't think gpm is the culprit.  However it easily could
be something else that has similar characteristics.

I just realized we do have proof that schedule_work is actually working
because SAK works, and we can't sanely do SAK from interrupt context
so we call schedule work.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] kthread: Enhance kthread_stop to abort interruptible sleeps

2007-04-14 Thread Oleg Nesterov

On 04/13, Eric W. Biederman wrote:
>
> +static inline int __kthread_should_stop(struct task_struct *tsk)
> +{
> + return test_tsk_thread_flag(tsk, TIF_KTHREAD_STOP);
> +}

Am I blind? Where does copy_process/dup_task_struct clears unwanted
flags in thread_info->flags ?

> +int kthread_stop(struct task_struct *tsk)
>  {
>   int ret;
>  
> - mutex_lock(_stop_lock);
> -
> - /* It could exit after stop_info.k set, but before wake_up_process. */
> - get_task_struct(k);
> + /* Ensure the task struct persists until I read the exit code. */
> + get_task_struct(tsk);
>  
> - /* Must init completion *before* thread sees kthread_stop_info.k */
> - init_completion(_stop_info.done);
> - smp_wmb();
> + set_tsk_thread_flag(tsk, TIF_KTHREAD_STOP);
> + spin_lock_irq(>sighand->siglock);
> + signal_wake_up(tsk, 1);
> + spin_unlock_irq(>sighand->siglock);

Off-topic again, the comment above signal_wake_up() is very confusing...
I think the only reason for ->siglock is to prevent the case when
TIF_SIGPENDING may be cleared by recalc_sigpending(). Or something else?

This changes the current API, kthread_stop() doesn't wake up UNINTERRUPTIBLE
tasks any longer. I'd say this is good, but may break things ... For example,
kthred_stop(kthread_create(...)) can't work now.

> - /* Now set kthread_should_stop() to true, and wake it up. */
> - kthread_stop_info.k = k;
> - wake_up_process(k);
> - put_task_struct(k);
> + wait_for_completion(tsk->vfork_done);
> + ret = tsk->exit_code;

This is really good. Now the kernel thread can exit() at any point without
fear to break kthread_stop().

> @@ -218,7 +219,7 @@ static inline int has_pending_signals(sigset_t *signal, 
> sigset_t *blocked)
>  fastcall void recalc_sigpending_tsk(struct task_struct *t)
>  {
>   if (t->signal->group_stop_count > 0 ||
> - (freezing(t)) ||
> + (freezing(t)) || __kthread_should_stop(t) ||
>   PENDING(>pending, >blocked) ||
>   PENDING(>signal->shared_pending, >blocked))
>   set_tsk_thread_flag(t, TIF_SIGPENDING);

Aha, now I understand your point about interruptible sleeps (the previous
message). What is the reason for this change?

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] make kthread_stop() scalable

2007-04-14 Thread Eric W. Biederman

Oleg Nesterov <[EMAIL PROTECTED]> writes:

> On 04/13, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <[EMAIL PROTECTED]> writes:
>> 
>> > It's a shame kthread_stop() (may take a while!) runs with a global 
>> > semaphore
>> > held. With this patch kthread() allocates all neccesary data (struct
> kthread)
>> > on its own stack, globals kthread_stop_xxx are deleted.
>> 
>> Oleg so fare you patches  have been inspiring.  However..
>> 
>> > HACKS:
>> >
>> >- re-use task_struct->set_child_tid to point to "struct kthread"
>> 
>>   task_struct->vfork_done is a better cannidate.
>> 
>> >- use do_exit() directly to preserve "struct kthread" on stack
>> 
>> Calling do_exit directly like that is not a hack, as it appears the preferred
>> way to exit is to call do_exit, or complete_and_exit.
>> 
>> While this does improve the scalability and remove a global variable.  It
>> also introduces a complex special case in the form of struct kthread.
>
> I can't say I agree. I thought it is good to have a struct which represents
> a kernel thread. Actually, I thought we can have __kthread_create() which
> returns "struct kthread". May be I am wrong, because yes, ->set_child_tid can
> point right to completion, and we can use some TIF flag instead of
> ->should_stop.

> This needs to update a lot of include/asm/ files.

Yes it does.

This is where I was going beyond what you were doing.  I needed a flag to say
that this a kthread that is stopping to test in recalc_sigpending.  To be 
certain
of terminating interruptible sleeps.  I could not get at your struct kthread
in that case.

If it wasn't for the wait_event_interruptible thing I likely would
have just thrown a union in struct task_struct.

I also got lucky in that vfork_done is designed to point a completion
just where I need it (when a task exits).  The name is now a little
abused but otherwise it does just what I want it to.

>> It also doesn't solve the biggest problem with the current kthread interface
>> in that calling kthread_stop does not cause the code to break out of
>> interruptible sleeps.
>
> Hm? kthread_stop() does wake_up_process(), it wakes up TASK_INTERRUPTIBLE 
> tasks.

Yes. But if they are looping, unless signal_pending is set it is quite possible
they will go back to sleep.

Take for example:

> #define __wait_event_interruptible(wq, condition, ret)\
> do {  \
>   DEFINE_WAIT(__wait);\
>   \
>   for (;;) {  \
>   prepare_to_wait(, &__wait, TASK_INTERRUPTIBLE);  \
>   if (condition)  \
>   break;  \
>   if (!signal_pending(current)) { \
>   schedule(); \
>   continue;   \
>   }   \
>   ret = -ERESTARTSYS; \
>   break;  \
>   }   \
>   finish_wait(, &__wait);  \
> } while (0)

We don't break out until either condition is true or signal_pending(current)
is true.

Loops that do that are very common in the kernel.  I counted about 500
calls of signal pending in places that otherwise care nothing about signals.
Several kernel threads call into functions that use loops like
wait_event_interruptible.  So I need a more forceful kthread_stop.  If
I don't want to continue to use signals.

>> > @@ -91,7 +105,7 @@ static void create_kthread(struct kthrea
>> >
>> >/* We want our own signal handler (we take no signals by default). */
>> >pid = kernel_thread(kthread, create, CLONE_FS | CLONE_FILES | SIGCHLD);
>> > -  create->result = pid;
>> > +  create->result = ERR_PTR(pid);
>> 
>> Ouch.You have a nasty race here.
>> 
>> If kthread runs before kernel_thread returns then setting
>> "create->result = ERR_PTR(pid);" could easily stomp 
>> "create->result = ".
>
> Yes, thanks... Can't understand how I was s stupid!!! thanks...
>
> Damn. We don't need 2 completions! just one.

Yep.  My second patch in this last round implements that.

>   create_kthread:
>
>   pid = kernel_thread(...);
>   if (pid < 0) {
>   create->result = ERR_PTR(pid);
>   complete(create->started);
>   }
>   // else: kthread() will do complete()
>
>   return;

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL

Re: Linux 2.6.21-rc6

2007-04-14 Thread Rafael J. Wysocki

On Saturday, 14 April 2007 15:00, Adrian Bunk wrote:
> On Sat, Apr 14, 2007 at 02:31:54PM +0200, Tobias Diedrich wrote:
> > Tobias Diedrich wrote:
> > > > ed746e3b18f4df18afa3763155972c5835f284c5 is first bad commit
> > > > commit ed746e3b18f4df18afa3763155972c5835f284c5
> > > > Author: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > > > Date:   Sat Feb 10 01:43:32 2007 -0800
> > > > 
> > > > [PATCH] swsusp: Change code ordering in disk.c
> > > > 
> > > > Change the ordering of code in kernel/power/disk.c so that 
> > > > device_suspend() is
> > > > called before disable_nonboot_cpus() and platform_finish() is 
> > > > called after
> > > > enable_nonboot_cpus() and before device_resume(), as indicated by 
> > > > the recent
> > > > discussion on Linux-PM (cf.
> > > > http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html).
> > > > 
> > > > The changes here only affect the built-in swsusp.
> > > > 
> > > > [EMAIL PROTECTED]: fix LED blinking during image load]
> > > > Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > > > Acked-by: Pavel Machek <[EMAIL PROTECTED]>
> > > > Cc: Greg KH <[EMAIL PROTECTED]>
> > > > Cc: Nigel Cunningham <[EMAIL PROTECTED]>
> > > > Cc: Patrick Mochel <[EMAIL PROTECTED]>
> > > > Cc: Alexey Starikovskiy <[EMAIL PROTECTED]>
> > > > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> > > > Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
> > > > 
> > > > :04 04 7eca5b3a8f9606bc4f2ff41192ec8c9d4ca90d18 
> > > > 8313b674e1d1bdf6849350af06d28a89b3bb3054 M  kernel
> > > > 
> > > > 
> > > > Now, the remaining test is to try reverting this commit from -rc6. :)
> > > 
> > > Doesn't apply cleanly against -rc6, but fixes the problem when
> > > reverted from -rc1.
> > 
> > Now, this was already reported in
> > http://lkml.org/lkml/2007/3/16/126
> > and I even flagged that message in my local folder, but apparently forgot
> > to follow up on it... *sigh*
> 
> Unless I misunderstood something, all of the problems Maxim described in 
> this email are fixed for him in -rc6.
> 
> But it's quite possible that you are running into a different issue 
> exposed by this commit.

Yes, it's likely.

Tobias, I'm unable to reproduce the problem with your .config, but my hardware
is certainly different.  Which suspend mode do you use?  If that's "platform",
can you try to use "shutdown" or "reboot" and see if that helps?

Rafael


-- 
If you don't have the time to read,
you don't have the time or the tools to write.
- Stephen King
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-14 Thread Willy Tarreau

On Sat, Apr 14, 2007 at 07:54:33PM +0200, Ingo Molnar wrote:
> 
> * Eric W. Biederman <[EMAIL PROTECTED]> wrote:
> 
> > > Thinking about it, I don't know if there are calls to schedule() 
> > > while switching from tty1 to tty2. Alt-F2 had no effect anymore, and 
> > > "chvt 2" simply blocked. It would have been possible that a 
> > > schedule() call somewhere got starved due to the load, I don't know.
> > 
> > It looks like there is a call to schedule_work.
> 
> so this goes over keventd, right?
> 
> > There are two pieces of the path. If you are switching in and out of a 
> > tty controlled by something like X.  User space has to grant 
> > permission before the operation happens.  Where there isn't a gate 
> > keeper I know it is cheaper but I don't know by how much, I suspect 
> > there is still a schedule happening in there.
> 
> Could keventd perhaps be starved? Willy, to exclude this possibility, 
> could you perhaps chrt keventd to RT priority? If events/0 is PID 5 then 
> the command to set it to SCHED_FIFO:50 would be:
> 
>   chrt -f -p 50 5
> 
> but ... events/0 is reniced to -5 by default, so it should definitely 
> not be starved.

Well, since I merged the fair-fork patch, I cannot reproduce (in fact,
bash forks 1000 processes, then progressively execs scheddos, but it
takes some time). So I'm rebuilding right now. But I think that Linus
has an interesting clue about GPM and notification before switching
the terminal. I think it was enabled in console mode. I don't know
how that translates to frozen xterms, but let's attack the problems
one at a time.

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] make kthread_stop() scalable

2007-04-14 Thread Oleg Nesterov

On 04/13, Eric W. Biederman wrote:
>
> Oleg Nesterov <[EMAIL PROTECTED]> writes:
> 
> > It's a shame kthread_stop() (may take a while!) runs with a global semaphore
> > held. With this patch kthread() allocates all neccesary data (struct 
> > kthread)
> > on its own stack, globals kthread_stop_xxx are deleted.
> 
> Oleg so fare you patches  have been inspiring.  However..
> 
> > HACKS:
> >
> > - re-use task_struct->set_child_tid to point to "struct kthread"
> 
>task_struct->vfork_done is a better cannidate.
> 
> > - use do_exit() directly to preserve "struct kthread" on stack
> 
> Calling do_exit directly like that is not a hack, as it appears the preferred
> way to exit is to call do_exit, or complete_and_exit.
> 
> While this does improve the scalability and remove a global variable.  It
> also introduces a complex special case in the form of struct kthread.

I can't say I agree. I thought it is good to have a struct which represents
a kernel thread. Actually, I thought we can have __kthread_create() which
returns "struct kthread". May be I am wrong, because yes, ->set_child_tid can
point right to completion, and we can use some TIF flag instead of 
->should_stop.
This needs to update a lot of include/asm/ files.

> It also doesn't solve the biggest problem with the current kthread interface
> in that calling kthread_stop does not cause the code to break out of
> interruptible sleeps.

Hm? kthread_stop() does wake_up_process(), it wakes up TASK_INTERRUPTIBLE tasks.

> > @@ -91,7 +105,7 @@ static void create_kthread(struct kthrea
> >
> > /* We want our own signal handler (we take no signals by default). */
> > pid = kernel_thread(kthread, create, CLONE_FS | CLONE_FILES | SIGCHLD);
> > -   create->result = pid;
> > +   create->result = ERR_PTR(pid);
> 
> Ouch.You have a nasty race here.
> 
> If kthread runs before kernel_thread returns then setting
> "create->result = ERR_PTR(pid);" could easily stomp 
> "create->result = ".

Yes, thanks... Can't understand how I was s stupid!!! thanks...

Damn. We don't need 2 completions! just one.

create_kthread:

pid = kernel_thread(...);
if (pid < 0) {
create->result = ERR_PTR(pid);
complete(create->started);
}
// else: kthread() will do complete()

return;

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-14 Thread Ingo Molnar


* Eric W. Biederman <[EMAIL PROTECTED]> wrote:

> > Thinking about it, I don't know if there are calls to schedule() 
> > while switching from tty1 to tty2. Alt-F2 had no effect anymore, and 
> > "chvt 2" simply blocked. It would have been possible that a 
> > schedule() call somewhere got starved due to the load, I don't know.
> 
> It looks like there is a call to schedule_work.

so this goes over keventd, right?

> There are two pieces of the path. If you are switching in and out of a 
> tty controlled by something like X.  User space has to grant 
> permission before the operation happens.  Where there isn't a gate 
> keeper I know it is cheaper but I don't know by how much, I suspect 
> there is still a schedule happening in there.

Could keventd perhaps be starved? Willy, to exclude this possibility, 
could you perhaps chrt keventd to RT priority? If events/0 is PID 5 then 
the command to set it to SCHED_FIFO:50 would be:

  chrt -f -p 50 5

but ... events/0 is reniced to -5 by default, so it should definitely 
not be starved.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-14 Thread Linus Torvalds

On Sat, 14 Apr 2007, Willy Tarreau wrote:
> 
> It is clearly possible. What I found strange is that I could still fork
> processes (eg: ls, dmesg|tail), ... but not switch to another VT anymore.

Considering the patches in question, it's almost definitely just a CPU 
scheduling problem with starvation.

The VT switching is obviously done by the kernel, but the kernel will 
signal and wait for the "controlling process" for the VT. The most obvious 
case of that is X, of course, but even in text mode I think gpm will 
have taken control of the VT's it runs on (all of them), which means that 
when you initiate a VT switch, the kernel will actually signal the 
controlling process (gpm), and wait for it to acknowledge the switch.

If gpm doesn't get a timeslice for some reason (and it sounds like there 
may be some serious unfairness after "fork()"), your behaviour is 
explainable.

(NOTE! I've never actually looked at gpm sources or what it really does, 
so maybe I'm wrong, and it doesn't try to do the controlling VT thing, and 
something else is going on, but quite frankly, it sounds like the obvious 
candidate for this bug. Explaining it with some non-scheduler-related 
thing sounds unlikely, considering the patch in question).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-14 Thread Eric W. Biederman

Willy Tarreau <[EMAIL PROTECTED]> writes:

> Hi Eric,
>
> [...]
>> >> the ramp up slows down after 700-800 processes, but something very 
>> >> strange happens. If I'm under X, I can switch the focus to all xterms 
>> >> (the WM is still alive) but all xterms are frozen. On the console, 
>> >> after one moment I simply cannot switch to another VT anymore while I 
>> >> can still start commands locally. But "chvt 2" simply blocks. SysRq-K 
>> >> killed everything and restored full control. Dmesg shows lots of :
>> >
>> >> SAK: killed process  (scheddos2): process_session(p)==tty->session.
>> 
>> This.  Yes. SAK is noisy and tells you everything it kills.
>
> OK, that's what I suspected, but I did not know if the fact that it talked
> about the session was systematic or related to any particular state when it
> killed the task.
>
>> >> I wonder if part of the problem would be too many processes bound to 
>> >> the same tty :-/
>> >
>> > hm, that's really weird. I've Cc:-ed the tty experts (Erik, Jiri, Alan), 
>> > maybe this description rings a bell with them?
>> 
>> Is there any swapping going on?
>
> Not at all.
>
>> I'm inclined to suspect that it is a problem that has more to do with the
>> number of processes and has nothing to do with ttys.
>
> It is clearly possible. What I found strange is that I could still fork
> processes (eg: ls, dmesg|tail), ... but not switch to another VT anymore.
> It first happened under X with frozen xterms but a perfectly usable WM,
> then I reproduced it on pure console to rule out any potential X problem.
>
>> Anyway you can easily rule out ttys by having your startup program
>> detach from a controlling tty before you start everything.
>> 
>> I'm more inclined to guess something is reading /proc a lot, or doing
>> something that holds the tasklist lock, a lot or something like that,
>> if the problem isn't that you are being kicked into swap.
>
> Oh I'm sorry you were invited into the discussion without a first description
> of the context. I was giving a try to Ingo's new scheduler, and trying to
> reach corner cases with lots of processes competing for CPU.
>
> I simply used a "for" loop in bash to fork 1000 processes, and this problem
> happened between 700-800 children. The program only uses a busy loop and a
> pause. I then changed my program to close 0,1,2 and perform the fork itself,
> and the problem vanished. So there are two differences here :
>
>   - bash not forking anymore
>   - far less FDs on /dev/tty1

Yes.  But with /dev/tty1 being the controlling terminal in both cases,
as you haven't dropped your session, or disassociated your tty.

The bash problem may have something to setpgid or scheduling effects.
Hmm.  I just looked and setpgid does grab the tasklist lock for
writing so we may possibly have some contention there.

> At first, I had around 2200 fds on /dev/tty1, reason why I suspected something
> in this area.
>
> I agree that this is not normal usage at all, I'm just trying to attack
> Ingo's scheduler to ensure it is more robust than the stock one. But
> sometimes brute force methods can make other sleeping problems pop up.

Yep.  If we can narrow it down to one that would be interesting.  Of course
that also means when we start finding other possibly sleeping problems people
are working in areas of code the don't normally touch, so we must investigate.

> Thinking about it, I don't know if there are calls to schedule() while
> switching from tty1 to tty2. Alt-F2 had no effect anymore, and "chvt 2"
> simply blocked. It would have been possible that a schedule() call
> somewhere got starved due to the load, I don't know.

It looks like there is a call to schedule_work.

There are two pieces of the path. If you are switching in and out of a tty
controlled by something like X.  User space has to grant permission before
the operation happens.  Where there isn't a gate keeper I know it is cheaper
but I don't know by how much, I suspect there is still a schedule happening
in there.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 >

1 - 100 of 407 matches

Mail list logo