Re: PROBLEM: "Make nenuconfig" does not save parameters.
[Bodo Eggert - Sun, Mar 11, 2007 at 06:21:59AM +0100] | Sam Ravnborg <[EMAIL PROTECTED]> wrote: | > On Sat, Mar 10, 2007 at 10:34:41PM +0100, Jan Engelhardt wrote: | >> On Mar 10 2007 22:27, Sam Ravnborg wrote: | >> >On Sat, Mar 10, 2007 at 07:23:41PM +0100, Jan Engelhardt wrote: | | >> >> Whether the 'working config file path' should change when you do | >> >> 'Save as Alternate' or not, is a menuconfig axiom. Ask Sam Ravnborg | >> >> if you want it changed :-) | >> > | >> >Current behaviour is not logical but on the other hand I do not | >> >see a big need to make it so. | >> >It seems that people very seldom uses "save alternate" anyway. | >> > | >> >But patches are welcome. | >> | >> ^_^ The patch has already been posted, has not it? | > No. | > Either we keep current behaviour | | , which is misleading, | | > or we change to the "normal" | > behaviour with a "Save as..." as know from all other programs. | | , which is not desirable, as long as there is no "open" and "save" option | also working as "normal". | | IMO the option should have the "Save a copy" semantics, since that's what the | name suggests. Please decribe me how should it work at all. I mean should we work with a single _active_ file and then "Save a copy" just put a config snapshot to some file but that will not affect an original file? Should we work with any config file as text editors do? Just write your point of view in details and give kernel community time to review... | -- | Top 100 things you don't want the sysadmin to say: | 51. YEEEHA!!! What a CRASH!!! | | Fri?, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] | Cyrill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 4/6] x86_64: Single Quicklist
On Sunday 11 March 2007 03:09, Christoph Lameter wrote: > x86_64: Convert to use a single quicklists > > This adds caching of pgds and puds, pmds, pte. That way we can > avoid costly zeroing and initialization of special mappings in the > pgd. > > The first patch just adds a simple implementation using a single > quicklist. As a consequence we need to zero a pgd before returning > it to the pool. This and i386 version are ok to me, although it might be better to just finish __GFP_ZERO support to do this. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netconsole system freeze when cable unplugged
On Sat, Mar 10, 2007 at 02:06:28PM +, Simon Arlott wrote: > On 10/03/07 13:38, Andi Kleen wrote: > >Simon Arlott <[EMAIL PROTECTED]> writes: > > > >>On 09/03/07 20:42, Francois Romieu wrote: > >>>Simon Arlott <[EMAIL PROTECTED]> : > When I unplug the cable the system just stops responding to > anything, at all. No message is printed to the console when the > cable is plugged back in. > >>>rtl8139_interrupt (spin_lock(>lock)) > >>>-> rtl8139_weird_interrupt > >>> -> rtl_check_media > >>> -> mii_check_media (printk(KERN_INFO "%s: link down\n", ...)) > >>> [netpoll stuff here] > >>> -> rtl8139_poll_controller > >>>-> rtl8139_interrupt > >>> *deadlock* > >>>See below for my random stuff of the day. Feel free to open a PR at > >>>bugzilla.kernel.org if the issue does not go away. > >>The patch doesn't fix it, nothing changes. I'm not sure how this can > >>be debugged if printk won't work... > > > >earlyprintk can be called directly (early_printk()) and should > >work. It won't log over the network of course. > > It also won't log over the serial console either :( It does, you just have to configure it properly. earlyprintk=serial,ttySx,baud -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: "Make nenuconfig" does not save parameters.
[Sam Ravnborg - Sat, Mar 10, 2007 at 11:45:34PM +0100] | On Sat, Mar 10, 2007 at 10:34:41PM +0100, Jan Engelhardt wrote: | > | > On Mar 10 2007 22:27, Sam Ravnborg wrote: | > >On Sat, Mar 10, 2007 at 07:23:41PM +0100, Jan Engelhardt wrote: | > >> | > >> Whether the 'working config file path' should change when you do | > >> 'Save as Alternate' or not, is a menuconfig axiom. Ask Sam Ravnborg | > >> if you want it changed :-) | > > | > >Current behaviour is not logical but on the other hand I do not | > >see a big need to make it so. | > >It seems that people very seldom uses "save alternate" anyway. | > > | > >But patches are welcome. | > | > ^_^ The patch has already been posted, has not it? | No. | Either we keep current behaviour or we change to the "normal" | behaviour with a "Save as..." as know from all other programs. | | Sam | Hi Sam, I think we should use "Save As..." idea. And thereby menuconfig, qconfig, gconfig will be affected. Please give me time to make patches. (I'm a little busy now so I hope to make them during a week :) The patch I sent to Vladimir does not normalize behaviour of config process but just makes an alternate file as a snapshot of current config state. You may review it as a temporary solution only. Cyrill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL-mm 0.28
On Sat, Mar 10, 2007 at 07:35:06PM -0600, Matt Mackall wrote: > I've tested -mm2 against -mm2+noyield and -mm2+rsdl+noyield. The > noyield patch simply makes the sched_yield syscall return immediately. > Xorg and all tests are run at nice 0. [skipped long and precise test report] > Also note I could occassionally trigger nasty multi-second pauses with > -mm2+noyield under exectest that didn't show up elsewhere. That's > probably a bug in the mainline scheduler. This is not a bug per se, but more a design problem. This is caused by the interactivity booster which is unfair. Mike Galbraith and others spent a lot of time trying to get rid of those problems a few versions ago. In early kernels (around 2.6.11), I could trivially cause pauses more than 30 seconds long by running a few tasks simulating an interactive workload. It is much more difficult to achieve this with recent kernels, and it has absolutely no effect on RSDL, which is one of the reasons I have to find it great ! Regards, Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] CIRRUS: Delete unused header file.
> On Sat, 10 Mar 2007 17:27:44 -0500 (EST) "Robert P. J. Day" <[EMAIL > PROTECTED]> wrote: > > Delete apparently unused header file > sound/pci/cs46xx/imgs/cwcemb80.h. > That patch series was rather a mess - Multiple patches with the same Subject: (I might have lost some as a result) - Several patches which tried to remove the same header file - Several patches which simply didn't apply - Inconsisent changelogging, inconsistent titling - Lack of sequence numbering (again, contributes to possible patch loss) - Useless indenting in changleog text which I have to edit away. We have good tools (ie: quilt) which make this sort of thing nice and easy to get right - please use them. I didn't check that these headers are indeed unused. I hope you got that minor part right.. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
libata extension
Good Day Say i want to implement extended set of ATA commands available to userspace for building diagnostic tools. I need 0x40 -- read verify and 0x32 -- write long with error handling, for example. I was trying ide driver through ioctl's, but seems it lack of functionality and full of gotchas. Furthermore it oopses sometimes. Is it possible to use libata for such purpose or i need to write separate IDE driver ? By the way, i'm sure it should be done in kernel space since i'm going to deal with some hdd manufacturer commands. P.S. I was looking through libata and ide sources and documentation but still dont have broad picture. Thanks - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [SLUB 0/3] SLUB: The unqueued slab allocator V5
On Sat, 10 Mar 2007, Andrew Morton wrote: > Is this safe to think about applying yet? Its safe. By default kernels will be build with SLAB. SLUB becomes only a selectable alternative. It should not become the primary slab until we know that its really superior overall and have thoroughly tested it in a variety of workloads. > We lost the leak detector feature. There will be numerous small things that will have to be addressed. There is also some minor work to be done for tracking callers better. > It might be nice to create synonyms for PageActive, PageReferenced and > PageError, to make things clearer in the slub core. At the expense of > making things less clear globally. Am unsure. I have been back and forth on doing that. There are somewhat similar in what they mean for SLUB. But creating synonyms may be confusing to those checking how page flags are being used. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [SLUB 0/3] SLUB: The unqueued slab allocator V5
Is this safe to think about applying yet? We lost the leak detector feature. It might be nice to create synonyms for PageActive, PageReferenced and PageError, to make things clearer in the slub core. At the expense of making things less clear globally. Am unsure. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL-mm 0.28
On Sunday 11 March 2007 15:03, Matt Mackall wrote: > On Sat, Mar 10, 2007 at 10:01:32PM -0600, Matt Mackall wrote: > > On Sun, Mar 11, 2007 at 01:28:22PM +1100, Con Kolivas wrote: > > > Ok I don't think there's any actual accounting problem here per se > > > (although I did just recently post a bugfix for rsdl however I think > > > that's unrelated). What I think is going on in the ccache testcase is > > > that all the work is being offloaded to kernel threads reading/writing > > > to/from the filesystem and the make is not getting any actual cpu > > > time. > > > > I don't see significant system time while this is happening. > > Also, it's running pretty much entirely out of page cache so there > wouldn't be a whole lot for kernel threads to do. Well I can't reproduce that behaviour here at all whether from disk or the pagecache with ccache, so I'm not entirely sure what's different at your end. However both you and the other person reporting bad behaviour were using ATI drivers. That's about the only commonality? I wonder if they do need to yield... somewhat instead of not at all. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...
On Sat, 2007-03-10 at 21:31 -0800, Linus Torvalds wrote: > > On Sat, 10 Mar 2007, Nicholas Miell wrote: > > > > Ah, I see. You're just interested in fds as a generic handle concept, > > and not a more Plan 9 type thing. > > Indeed. It's a "handle". > > UNIX has pid's for "process" handles, and "file descriptors" for just > about everything else. And I imagine that somebody will come up with way of getting a fd for a process sooner or later. > > If that's the goal, somebody should start thinking about reducing the > > contents of struct file to the bare minimum (i.e. not much more than a > > file_operations pointer). > > Well, there's more there, but it really is fairly close. If you look at > it, a "struct file" ends up not having a lot more than the minimal stuff > required to use it as a a handle: it really isn't a very big structure. > > The biggest part is actually the read-ahead state, which is arguably a > generic thing for a file handle, even though not all kinds will be able to > use it. We *could* make that be behind a pointer (along with the "f_pos" > thing, that really logically goes along with the read-ahead thing), of > course, but since most files probably do end up being "traditional file" > structures, it's probably not wrong to just have it in the file. > Actually, I was thinking reducing struct file to the bare minimum, and then using that as the common header shared by object-specific structures. I don't know how unpleasant that would be from a memory allocation perspective, though. -- Nicholas Miell <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...
On Sat, 10 Mar 2007, Linus Torvalds wrote: > > Actually, the only place where I can find the itimerspec usefull, is > > indeed with TFD_TIMER_SEQ. In cases where you want you clock starting at a > > given time (it_value) *and* with the given frequency (it_interval). > > .. and this is where itimerspec is even better: once you have absolute > time, *and* a process that might miss ticks (because it does something > else), the "absolute time start + interval" thing can avoid drifting > (which a "relative interval" has a really hard time doing). > > So if you want a "timer tick every second, *on* the second" kind of > interface, you really do want a absolute time starting point, and then a > fixed interval. Two different times. Alrighty, I'll use a itimerspec ... - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...
On Sat, 10 Mar 2007, Davide Libenzi wrote: > On Sat, 10 Mar 2007, Linus Torvalds wrote: > > > (That said, using "struct itimerspec" might be a good idea. That would > > also obviate the need for TFD_TIMER_SEQ, since an itimerspec automatically > > has both "base" and "incremental" parts). > > But TFD_TIMER_SEQ is a simple auto-rearm case of TFD_TIMER_REL. So the > timespec is sufficent too (in all three cases we just need *one* time). Well, people actually do use itimers like "give me a timer every second, starting five seconds from now". > Actually, the only place where I can find the itimerspec usefull, is > indeed with TFD_TIMER_SEQ. In cases where you want you clock starting at a > given time (it_value) *and* with the given frequency (it_interval). .. and this is where itimerspec is even better: once you have absolute time, *and* a process that might miss ticks (because it does something else), the "absolute time start + interval" thing can avoid drifting (which a "relative interval" has a really hard time doing). So if you want a "timer tick every second, *on* the second" kind of interface, you really do want a absolute time starting point, and then a fixed interval. Two different times. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...
On Sat, 10 Mar 2007, Nicholas Miell wrote: > > Ah, I see. You're just interested in fds as a generic handle concept, > and not a more Plan 9 type thing. Indeed. It's a "handle". UNIX has pid's for "process" handles, and "file descriptors" for just about everything else. > If that's the goal, somebody should start thinking about reducing the > contents of struct file to the bare minimum (i.e. not much more than a > file_operations pointer). Well, there's more there, but it really is fairly close. If you look at it, a "struct file" ends up not having a lot more than the minimal stuff required to use it as a a handle: it really isn't a very big structure. The biggest part is actually the read-ahead state, which is arguably a generic thing for a file handle, even though not all kinds will be able to use it. We *could* make that be behind a pointer (along with the "f_pos" thing, that really logically goes along with the read-ahead thing), of course, but since most files probably do end up being "traditional file" structures, it's probably not wrong to just have it in the file. > It'd be useful if the polling interfaces could return small datums > beyond just the POLL* flags -- having to do a read on timerfd just to > get the overrun count has a lot of overhead for just an integer, and I > imagine other things would like to pass back stuff too. Well, since a lot of the interfaces harken back to "select()", we really are stuck with basically a couple of bits total (poll extends on the number of bits, but not a whole lot). So right now we have just "an event happened", and if you want to know more, you do need to do a read() or similar. That's true of all the traditional file descriptors too, of course. > You still want timeouts, creating/setting/destroying at timer just for > a single call to select/poll/epoll is probably too heavy weight. Well, since the interfaces for that already exists, I'm certainly not going to disagree. > timerfd() still leaves out the basic clock selection functionality > provided by both setitimer() and timer_create(). Well, the setitimer ones do not really make sense for a timer that isn't directly associated with one particular process. Once it's associated with a file descriptor, it really isn't bound to any particular execution context, and as such, virtual and profiling timers really don't make any sense any more! The only thing that exists outside of an execution context is really just "relative" and "absolute". Of course, you could still specify just what you want your timers to be based on (ie the "realtime" vs "monotonic" thing), and possibly the resolution, but it really does boil down to just those two choices (and the rest is just confusion). So I really don't think you lose a lot by just limiting it to "real time" vs "relative time". Those really *are* the choices. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: "Make nenuconfig" does not save parameters.
Sam Ravnborg <[EMAIL PROTECTED]> wrote: > On Sat, Mar 10, 2007 at 10:34:41PM +0100, Jan Engelhardt wrote: >> On Mar 10 2007 22:27, Sam Ravnborg wrote: >> >On Sat, Mar 10, 2007 at 07:23:41PM +0100, Jan Engelhardt wrote: >> >> Whether the 'working config file path' should change when you do >> >> 'Save as Alternate' or not, is a menuconfig axiom. Ask Sam Ravnborg >> >> if you want it changed :-) >> > >> >Current behaviour is not logical but on the other hand I do not >> >see a big need to make it so. >> >It seems that people very seldom uses "save alternate" anyway. >> > >> >But patches are welcome. >> >> ^_^ The patch has already been posted, has not it? > No. > Either we keep current behaviour , which is misleading, > or we change to the "normal" > behaviour with a "Save as..." as know from all other programs. , which is not desirable, as long as there is no "open" and "save" option also working as "normal". IMO the option should have the "Save a copy" semantics, since that's what the name suggests. -- Top 100 things you don't want the sysadmin to say: 51. YEEEHA!!! What a CRASH!!! Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RSDL v0.29 backport to 2.6.18.8
Hello, again, I just saw that my 0.28 patch file was wrongly named 0.26 and that there is a new version 0.29 of RSDL that just came out... so here is the backported RSDL 0.29 to a 2.6.18.8 kernel. This does compile but I did not got the time to fully test it yet. > > Here is an update for RSDL to version 0.28 > > Full patch: > http://ck.kolivas.org/patches/staircase-deadline/2.6.20-sched- > rsdl-0.28.patch > > Series: > http://ck.kolivas.org/patches/staircase-deadline/2.6.20/ > > The patch to get you from 0.26 to 0.28: > http://ck.kolivas.org/patches/staircase-deadline/2.6.20/sched- > rsdl-0.26-0.28.patch > > A similar patch and directories will be made for 2.6.21-rc3 > without further announcement > Once again, thanx Con for this nice piece of code. Also note that this patch already includes a few other patches from 2.6.19+ kernel and there might also be other small pieces of code comming from a 2.6.19+ kernel: PATCH 1: http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.20.y.git;a=co mmit;h=ece8a684c75df215320b4155944979e3f78c5c93 PATCH 2: http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.20.y.git;a=co mmit;h=08c183f31bdbb709f177f6d3110d5f288ea33933 PATCH 3: Original RSDL patches (thnx again Con) http://ck.kolivas.org/patches/staircase-deadline/ Due to the project I'm currently working on, this will, in the next few weeks, help me out comparing heavy loads on a Debian Sarge/Etch 32/64 platform. Suggestions on benchmark tools would greatly be appreciated. Again, duno if this will be helpfull for anybody... but who knows! - vin sched-rsdl-0.29-backport-kernel-2.6.18.patch.gz Description: GNU Zip compressed data
RSDL v0.28 for 2.6.20 -> backport to 2.6.18.8
Hi all, > > Here is an update for RSDL to version 0.28 > > Full patch: > http://ck.kolivas.org/patches/staircase-deadline/2.6.20-sched- > rsdl-0.28.patch > > Series: > http://ck.kolivas.org/patches/staircase-deadline/2.6.20/ > > The patch to get you from 0.26 to 0.28: > http://ck.kolivas.org/patches/staircase-deadline/2.6.20/sched- > rsdl-0.26-0.28.patch > > A similar patch and directories will be made for 2.6.21-rc3 > without further announcement > First of all, thanx Con for this nice piece of code. I've been trying in the last few days to backport this new scheduler to a 2.6.18 kernel. After a lot of efforts I have finally been able to compile and run a RSDL patched 2.6.18.8 kernel on a x86_64 arch and actually my test PC booted 2-3 seconds faster with it compared to a vanilla 2.6.18.8 kernel. This patch includes a few other patches from 2.6.19+ kernel: PATCH 1: http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.20.y.git;a=co mmit;h=ece8a684c75df215320b4155944979e3f78c5c93 PATCH 2: http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.20.y.git;a=co mmit;h=08c183f31bdbb709f177f6d3110d5f288ea33933 PATCH 3: The patch to get you from 0.26 to 0.28: http://ck.kolivas.org/patches/staircase-deadline/2.6.20/sched-rsdl-0.26- 0.28.patch There might also be other small pieces of code comming from a 2.6.19+ kernel. Due to the project I'm currently working on, this will, in the next few weeks, help me out comparing heavy loads on a Debian Sarge/Etch 32/64 platform. Suggestions on benchmark tools would greatly be appreciated. Duno if this will be helpfull for anybody but I tought it would be nice to give it back to the lkml community. - vin sched-rsdl-0.26-backport-kernel-2.6.18.patch.gz Description: sched-rsdl-0.26-backport-kernel-2.6.18.patch.gz
Re: [PATCH v2] Bitbanging i2c bus driver using the GPIO API
On Saturday 10 March 2007 5:13 am, Haavard Skinnemoen wrote: > This is a very simple bitbanging i2c bus driver utilizing the new > arch-neutral GPIO API. ... > --- > This patch is different from the first patch in the following ways: > * Handles pins set up as open drain (aka multidrive) by toggling > the output value instead of the direction > * Handles output-only SCL pins the same way, and also does not > install a getscl() callback for such pins > * Does not add anything to include/linux/i2c-ids.h > * Sets the output value explicitly after changing the direction to > output. > * Plugs a memory leak in remove() -- algo_data wasn't freed. > * Prints out the pin IDs in decimal, with an extra note when clock > stretching isn't supported > > This version has been compile-tested only. I'll give it a spin when I > get back to work on monday. > > Dave, does this address your concerns? Yes, though see my followup to Jean's note. Unless I make time to test this out on some system, the issues seem to be: (a) will need to change once gpio_direction_output() gains that second argument; (b) i2c-gpio.h could stand one minor comment addition to highlight an assumption. Looking good! - Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] Bitbanging i2c bus driver using the GPIO API
On Saturday 10 March 2007 12:15 pm, Jean Delvare wrote: > Hi Haavard, > > On Sat, 10 Mar 2007 14:13:28 +0100, Haavard Skinnemoen wrote: > > This is a very simple bitbanging i2c bus driver utilizing the new > > arch-neutral GPIO API. Useful for chips that don't have a built-in > > i2c controller, additional i2c busses, or testing purposes. This updated version looks a lot better. However it doesn't address the API change -- gpio_direction_output(gpio, initial_value) -- which is understandable since that patch hasn't yet merged. > I like the idea very much. Would this let us get rid of i2c-ixp2000? > i2c-ixp4xx? scx200_i2c? Other drivers? There's CONFIG_GENERIC_GPIO support for ixp4xx (nyet upstream, ISTR it's waiting on the gpio_direction_output update), so that one should be particularly easy to replace. Presumably some other bitbang drivers could vanish before long too. > What value will you get if the SDA pin is open-drain and currently in > output mode? For output GPIOs, gpio_get_value() is specified to either return the actual value at the pin ... or zero, if the hardware can't do that. Most GPIO pins *can* do that. (Specifically, that's how AT91 GPIOs work, open drain or otherwise.) (However, there can be various latencies involved. On one chip when I wrote the output value, then immediately read it back, I got the old value. Reason: the GPIO controller clock needed to tick first in order to latch the new input value! It was only about 30 MHz, so the back-to-back instructions were too fast. You can also sometimes notice capacitance causing similar delays. Of course those latencies apply regardless of pin direction.) I think Haavard is assuming the GPIO actually returns that value, since otherwise there'd be no point in trying to use the open drain mode. It'd be worth capturing that in the i2c-gpio.h definition for that struct. > Are such GPIO pins actually able to detect that the pin is > low while they are not themselves driving it low? Given a "yes" to the above, then clearly "yes" here too. As I noted, if it can't actually sense the value at the pin, that function should always return zero. - Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: and try remove another quirk on this computers Re: [3/6] 2.6.21-rc2: known regressions
On Fri, 2007-03-09 at 21:41 -0800, Linus Torvalds wrote: > > On Sat, 10 Mar 2007, Sergio Monteiro Basto wrote: > > > > With this quirk I got this oops on hibernate (but computer still > > working) > > Well, strictly speaking it's a warning, not an oops per se. > > What happens is that the quirk wants to do an "ioremap_nocache()", which > allocates memory, and that happens very early during initialization when > interrupts are disabled. > > And you're really not supposed to allocate memory, except using > GFP_ATOMIC. But we've always been lax about that during early boot, so we > have stuff that does. And resume ends up doing a lot of the same things > early boot does, and shows issues like this. > > So the quirk is probably still a good idea, and the warning message is > just that - a very scary warning message, but not an indicator that > anything is seriously screwed up for you. > > (It is an indication of a real bug, though, even though it's harmless in > practice in this case) Hi, thanks Just to write, I test last fedora kernel(2.6.20-1.2981.fc7) which is based on 2.6.21-rc3-git5, without any problem, less than the scary warning, talked in this email :) Best regards, -- Sérgio M. B. smime.p7s Description: S/MIME cryptographic signature
Re: RSDL-mm 0.28
On Sat, Mar 10, 2007 at 10:01:32PM -0600, Matt Mackall wrote: > On Sun, Mar 11, 2007 at 01:28:22PM +1100, Con Kolivas wrote: > > Ok I don't think there's any actual accounting problem here per se > > (although I did just recently post a bugfix for rsdl however I think > > that's unrelated). What I think is going on in the ccache testcase is > > that all the work is being offloaded to kernel threads reading/writing > > to/from the filesystem and the make is not getting any actual cpu > > time. > > I don't see significant system time while this is happening. Also, it's running pretty much entirely out of page cache so there wouldn't be a whole lot for kernel threads to do. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL-mm 0.28
On Sun, Mar 11, 2007 at 01:28:22PM +1100, Con Kolivas wrote: > >make -j 5 ccache > > berylok good awful > > galeon goodgood bad > > mp3 goodgood bad > > terminal goodgood bad/ok > > mousegoodgood bad/ok ... > >RSDL makes most of the noyield hit back in normal make and then some > >with ccache. Impressive. But ccache is still destroying interactivity > >somehow. The ccache effect is fairly visible even with non-parallel > >'make'. > > Ok I don't think there's any actual accounting problem here per se > (although I did just recently post a bugfix for rsdl however I think > that's unrelated). What I think is going on in the ccache testcase is > that all the work is being offloaded to kernel threads reading/writing > to/from the filesystem and the make is not getting any actual cpu > time. I don't see significant system time while this is happening. > This is "worked around" in mainline thanks to the testing for > sleeping on uninterruptible sleep in the interactivity estimator. What > I suspect is happening is kernel threads that are running nice -5 are > doing all the work on make's behalf in the setting of ccache since it > is mostly i/o bound. The reason for -nice values on kernel threads is > questionable anyway. Can you try renicing your kernel threads all to > nice 0 and see what effect that has? Obviously this doesn't need a > recompile, but is simple enough to implement in kthread code as a new > default. Sorry, little to no benefit. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 7/7] sched: document rsdl cpu scheduler
From: Con Kolivas <[EMAIL PROTECTED]> Add comprehensive documentation of the RSDL cpu scheduler design. Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> Cc: Ingo Molnar <[EMAIL PROTECTED]> Cc: Nick Piggin <[EMAIL PROTECTED]> Cc: "Siddha, Suresh B" <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- Documentation/sched-design.txt | 273 - 1 file changed, 267 insertions(+), 6 deletions(-) Index: linux-2.6.21-rc3-mm2/Documentation/sched-design.txt === --- linux-2.6.21-rc3-mm2.orig/Documentation/sched-design.txt2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/Documentation/sched-design.txt 2007-03-11 14:48:00.0 +1100 @@ -1,11 +1,14 @@ - Goals, Design and Implementation of the - new ultra-scalable O(1) scheduler + Goals, Design and Implementation of the ultra-scalable O(1) scheduler by + Ingo Molnar and the Rotating Staircase Deadline cpu scheduler policy + designed by Con Kolivas. - This is an edited version of an email Ingo Molnar sent to - lkml on 4 Jan 2002. It describes the goals, design, and - implementation of Ingo's new ultra-scalable O(1) scheduler. - Last Updated: 18 April 2002. + This was originally an edited version of an email Ingo Molnar sent to + lkml on 4 Jan 2002. It describes the goals, design, and implementation + of Ingo's ultra-scalable O(1) scheduler. It now contains a description + of the Rotating Staircase Deadline priority scheduler that was built on + this design. + Last Updated: Sun Feb 25 2007 Goal @@ -163,3 +166,261 @@ certain code paths and data constructs. code is smaller than the old one. Ingo + + +Rotating Staircase Deadline cpu scheduler policy + + +Design summary +== + +A novel design which incorporates a foreground-background descending priority +system (the staircase) with runqueue managed minor and major epochs (rotation +and deadline). + + +Features + + +A starvation free, strict fairness O(1) scalable design with interactivity +as good as the above restrictions can provide. There is no interactivity +estimator, no sleep/run measurements and only simple fixed accounting. +The design has strict enough a design and accounting that task behaviour +can be modelled and maximum scheduling latencies can be predicted by +the virtual deadline mechanism that manages runqueues. The prime concern +in this design is to maintain fairness at all costs determined by nice level, +yet to maintain as good interactivity as can be allowed within the +constraints of strict fairness. + + +Design description +== + +RSDL works off the principle of providing each task a quota of runtime that +it is allowed to run at each priority level equal to its static priority +(ie. its nice level) and every priority below that. When each task is queued, +the cpu that it is queued onto also keeps a record of that quota. If the +task uses up its quota it is decremented one priority level. Also, if the cpu +notices a quota full has been used for that priority level, it pushes +everything remaining at that priority level to the next lowest priority +level. Once every runtime quota has been consumed of every priority level, +a task is queued on the "expired" array. When no other tasks exist with +quota, the expired array is activated and fresh quotas are handed out. This +is all done in O(1). + + +Design details +== + +Each cpu has its own runqueue which micromanages its own epochs, and each +task keeps a record of its own entitlement of cpu time. Most of the rest +of these details apply to non-realtime tasks as rt task management is +straight forward. + +Each runqueue keeps a record of what major epoch it is up to in the +rq->prio_rotation field which is incremented on each major epoch. It also +keeps a record of quota available to each priority value valid for that +major epoch in rq->prio_quota[]. + +Each task keeps a record of what major runqueue epoch it was last running +on in p->rotation. It also keeps a record of what priority levels it has +already been allocated quota from during this epoch in a bitmap p->bitmap. + +The only tunable that determines all other details is the RR_INTERVAL. This +is set to 6ms (minimum on 1000HZ, higher at different HZ values). + +All tasks are initially given a quota based on RR_INTERVAL. This is equal to +RR_INTERVAL between nice values of 0 and 19, and progressively larger for +nice values from -1 to -20. This is assigned to p->quota and only changes +with changes in nice level. + +As a task is first queued, it checks in recalc_task_prio to see if it has +run at this runqueue's current priority rotation. If it has not, it will +have its p->prio level set to equal its p->static_prio (nice level) and will +be given a p->time_slice equal to the p->quota, and
[PATCH][RSDL-mm 3/7] sched: remove noninteractive flag
From: Con Kolivas <[EMAIL PROTECTED]> Remove the TASK_NONINTERACTIVE flag as it will no longer be used. Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> Cc: Ingo Molnar <[EMAIL PROTECTED]> Cc: Nick Piggin <[EMAIL PROTECTED]> Cc: "Siddha, Suresh B" <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- fs/pipe.c |7 +-- include/linux/sched.h |3 +-- 2 files changed, 2 insertions(+), 8 deletions(-) Index: linux-2.6.21-rc3-mm2/fs/pipe.c === --- linux-2.6.21-rc3-mm2.orig/fs/pipe.c 2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/fs/pipe.c 2007-03-11 14:47:59.0 +1100 @@ -41,12 +41,7 @@ void pipe_wait(struct pipe_inode_info *p { DEFINE_WAIT(wait); - /* -* Pipes are system-local resources, so sleeping on them -* is considered a noninteractive wait: -*/ - prepare_to_wait(>wait, , - TASK_INTERRUPTIBLE | TASK_NONINTERACTIVE); + prepare_to_wait(>wait, , TASK_INTERRUPTIBLE); if (pipe->inode) mutex_unlock(>inode->i_mutex); schedule(); Index: linux-2.6.21-rc3-mm2/include/linux/sched.h === --- linux-2.6.21-rc3-mm2.orig/include/linux/sched.h 2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/include/linux/sched.h 2007-03-11 14:47:59.0 +1100 @@ -150,8 +150,7 @@ extern unsigned long weighted_cpuload(co #define EXIT_ZOMBIE16 #define EXIT_DEAD 32 /* in tsk->state again */ -#define TASK_NONINTERACTIVE64 -#define TASK_DEAD 128 +#define TASK_DEAD 64 #define __set_task_state(tsk, state_value) \ do { (tsk)->state = (state_value); } while (0) -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 4/7] sched: implement 180 bit sched bitmap
From: Con Kolivas <[EMAIL PROTECTED]> Modify the sched_find_first_bit function to work on a 180bit long bitmap. Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> Cc: Ingo Molnar <[EMAIL PROTECTED]> Cc: Nick Piggin <[EMAIL PROTECTED]> Cc: "Siddha, Suresh B" <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- include/asm-generic/bitops/sched.h | 10 ++ include/asm-s390/bitops.h | 12 +--- 2 files changed, 7 insertions(+), 15 deletions(-) Index: linux-2.6.21-rc3-mm2/include/asm-generic/bitops/sched.h === --- linux-2.6.21-rc3-mm2.orig/include/asm-generic/bitops/sched.h 2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/include/asm-generic/bitops/sched.h 2007-03-11 14:47:59.0 +1100 @@ -6,8 +6,8 @@ /* * Every architecture must define this function. It's the fastest - * way of searching a 140-bit bitmap where the first 100 bits are - * unlikely to be set. It's guaranteed that at least one of the 140 + * way of searching a 180-bit bitmap where the first 100 bits are + * unlikely to be set. It's guaranteed that at least one of the 180 * bits is cleared. */ static inline int sched_find_first_bit(const unsigned long *b) @@ -15,7 +15,7 @@ static inline int sched_find_first_bit(c #if BITS_PER_LONG == 64 if (unlikely(b[0])) return __ffs(b[0]); - if (likely(b[1])) + if (b[1]) return __ffs(b[1]) + 64; return __ffs(b[2]) + 128; #elif BITS_PER_LONG == 32 @@ -27,7 +27,9 @@ static inline int sched_find_first_bit(c return __ffs(b[2]) + 64; if (b[3]) return __ffs(b[3]) + 96; - return __ffs(b[4]) + 128; + if (b[4]) + return __ffs(b[4]) + 128; + return __ffs(b[5]) + 160; #else #error BITS_PER_LONG not defined #endif Index: linux-2.6.21-rc3-mm2/include/asm-s390/bitops.h === --- linux-2.6.21-rc3-mm2.orig/include/asm-s390/bitops.h 2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/include/asm-s390/bitops.h 2007-03-11 14:47:59.0 +1100 @@ -729,17 +729,7 @@ find_next_bit (const unsigned long * add return offset + find_first_bit(p, size); } -/* - * Every architecture must define this function. It's the fastest - * way of searching a 140-bit bitmap where the first 100 bits are - * unlikely to be set. It's guaranteed that at least one of the 140 - * bits is cleared. - */ -static inline int sched_find_first_bit(unsigned long *b) -{ - return find_first_bit(b, 140); -} - +#include #include #include -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 5/7] sched dont renice kernel threads
The practice of renicing kernel threads to negative nice values is of questionable benefit at best, and at worst leads to larger latencies when kernel threads are busy on behalf of other tasks. Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> --- kernel/workqueue.c |1 - 1 file changed, 1 deletion(-) Index: linux-2.6.21-rc3-mm2/kernel/workqueue.c === --- linux-2.6.21-rc3-mm2.orig/kernel/workqueue.c2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/kernel/workqueue.c 2007-03-11 14:47:59.0 +1100 @@ -294,7 +294,6 @@ static int worker_thread(void *__cwq) if (!cwq->wq->freezeable) current->flags |= PF_NOFREEZE; - set_user_nice(current, -5); /* * We inherited MPOL_INTERLEAVE from the booting kernel. * Set MPOL_DEFAULT to insure node local allocations. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 1/7] lists: add list splice tail
From: Con Kolivas <[EMAIL PROTECTED]> Add a list_splice_tail variant of list_splice. Patch-by: Peter Zijlstra <[EMAIL PROTECTED]> Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> Cc: Ingo Molnar <[EMAIL PROTECTED]> Cc: Nick Piggin <[EMAIL PROTECTED]> Cc: "Siddha, Suresh B" <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- include/linux/list.h | 42 ++ 1 file changed, 42 insertions(+) Index: linux-2.6.21-rc3-mm2/include/linux/list.h === --- linux-2.6.21-rc3-mm2.orig/include/linux/list.h 2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/include/linux/list.h 2007-03-11 14:47:59.0 +1100 @@ -333,6 +333,20 @@ static inline void __list_splice(struct at->prev = last; } +static inline void __list_splice_tail(struct list_head *list, + struct list_head *head) +{ + struct list_head *first = list->next; + struct list_head *last = list->prev; + struct list_head *at = head->prev; + + first->prev = at; + at->next = first; + + last->next = head; + head->prev = last; +} + /** * list_splice - join two lists * @list: the new list to add. @@ -345,6 +359,18 @@ static inline void list_splice(struct li } /** + * list_splice_tail - join two lists at one's tail + * @list: the new list to add. + * @head: the place to add it in the first list. + */ +static inline void list_splice_tail(struct list_head *list, + struct list_head *head) +{ + if (!list_empty(list)) + __list_splice_tail(list, head); +} + +/** * list_splice_init - join two lists and reinitialise the emptied list. * @list: the new list to add. * @head: the place to add it in the first list. @@ -417,6 +443,22 @@ static inline void list_splice_init_rcu( } /** + * list_splice_tail_init - join 2 lists at one's tail & reinitialise emptied + * @list: the new list to add. + * @head: the place to add it in the first list. + * + * The list at @list is reinitialised + */ +static inline void list_splice_tail_init(struct list_head *list, +struct list_head *head) +{ + if (!list_empty(list)) { + __list_splice_tail(list, head); + INIT_LIST_HEAD(list); + } +} + +/** * list_entry - get the struct for this entry * @ptr: the list_head pointer. * @type: the type of the struct this is embedded in. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 2/7] sched: remove sleepavg from proc
From: Con Kolivas <[EMAIL PROTECTED]> Remove the sleep_avg field from proc output as it will be removed from the task_struct. Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> Cc: Ingo Molnar <[EMAIL PROTECTED]> Cc: Nick Piggin <[EMAIL PROTECTED]> Cc: "Siddha, Suresh B" <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- fs/proc/array.c |2 -- 1 file changed, 2 deletions(-) Index: linux-2.6.21-rc3-mm2/fs/proc/array.c === --- linux-2.6.21-rc3-mm2.orig/fs/proc/array.c 2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/fs/proc/array.c2007-03-11 14:47:59.0 +1100 @@ -171,7 +171,6 @@ static inline char * task_state(struct t buffer += sprintf(buffer, "State:\t%s\n" - "SleepAVG:\t%lu%%\n" "Tgid:\t%d\n" "Pid:\t%d\n" "PPid:\t%d\n" @@ -179,7 +178,6 @@ static inline char * task_state(struct t "Uid:\t%d\t%d\t%d\t%d\n" "Gid:\t%d\t%d\t%d\t%d\n", get_task_state(p), - (p->sleep_avg/1024)*100/(102000/1024), p->tgid, p->pid, pid_alive(p) ? rcu_dereference(p->parent)->tgid : 0, tracer_pid, -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
What follows this email is a patch series for the latest version of the RSDL cpu scheduler (ie v0.29). I have addressed all bugs that I am able to reproduce in this version so if some people would be kind enough to test if there are any hidden bugs or oops lurking, it would be nice to know in anticipation of putting this back in -mm. Thanks. Full patch for 2.6.21-rc3-mm2: http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.29.patch Patch series (which will follow this email): http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2/ Changelog: - Fixed the longstanding buggy bitmap problem which occurred due to swapping arrays when there were still tasks on the active array. - Fixed preemption of realtime tasks when rt prio inheritance elevated their priority. - Made kernel threads not be reniced to -5 by default - Changed sched_yield behaviour of SCHED_NORMAL (SCHED_OTHER) to resemble realtime task yielding. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL-mm 0.28
On Sunday 11 March 2007 14:39, Andrew Morton wrote: > > On Sun, 11 Mar 2007 14:59:28 +1100 Con Kolivas <[EMAIL PROTECTED]> wrote: > > > Bottom line: we've had a _lot_ of problems with the new yield() > > > semantics. We effectively broke back-compatibility by changing its > > > behaviour a lot, and we can't really turn around and blame application > > > developers for that. > > > > So... I would take it that's a yes for a recommendation with respect to > > implementing a new yield() ? A new scheduler is as good a time as any to > > do it. > > I guess so. We'd, err, need to gather Ingo's input ;) cc'ed. Don't you hate timezones? > Perhaps a suitable way of doing this would be to characterise then emulate > the 2.4 behaviour. As long as it turns out to be vaguely sensible. It's really very simple. We just go the end of the current queued priority on the same array instead of swapping to the expired array; ie we do what realtime tasks currently do. It works fine here locally afaict. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL-mm 0.28
On Sun, 11 Mar 2007 13:28:22 +1100 "Con Kolivas" <[EMAIL PROTECTED]> wrote: >> Well... are you advocating we change sched_yield semantics to a >> gentler form? On Sat, Mar 10, 2007 at 07:16:14PM -0800, Andrew Morton wrote: > From a practical POV: our present yield() behaviour is so truly awful that > it's basically always a bug to use it. This probably isn't a good thing. > So yes, I do think that we should have a rethink and try to come up with > behaviour which is more in accord with what application developers expect > yield() to do. ISTR that apps varied wrt. their expectations for yield(). Some, particularly those using it to implement multi-tiered userspace locks, really did expect to go all the way to the back of the queue. (Rumor has it that realtime apps break otherwise.) Others wanted a kinder, gentler, "mistress, please hit me, but not too hard" opportunity to let someone else have a little cpu time, particularly when userspace is spinning in some sort of busywait. In both scenarios something very much against the latest trends of Linux kernel politics is done by userspace. On Sat, Mar 10, 2007 at 07:16:14PM -0800, Andrew Morton wrote: > otoh, > a) we should have done this five years ago. Instead, we've spent that >time training userspace programmers to not use yield(), so perhaps >there's little to be gained in changing it now. > b) if we _were_ to change yield(), people would use it more, and their >applications would of course suck bigtime when run on earlier 2.6 >kernels. > Bottom line: we've had a _lot_ of problems with the new yield() semantics. > We effectively broke back-compatibility by changing its behaviour a lot, > and we can't really turn around and blame application developers for that. My dumb idea would be to break out new syscall. One for the kinder, gentler version, one for the serious version. Or otherwise pass an argument indicating the expected behavior. Then a dumb app can be LD_PRELOAD'd into calling whatever makes it run fastest. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...
On Sat, 10 Mar 2007, Linus Torvalds wrote: > (That said, using "struct itimerspec" might be a good idea. That would > also obviate the need for TFD_TIMER_SEQ, since an itimerspec automatically > has both "base" and "incremental" parts). But TFD_TIMER_SEQ is a simple auto-rearm case of TFD_TIMER_REL. So the timespec is sufficent too (in all three cases we just need *one* time). Actually, the only place where I can find the itimerspec usefull, is indeed with TFD_TIMER_SEQ. In cases where you want you clock starting at a given time (it_value) *and* with the given frequency (it_interval). - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL-mm 0.28
> On Sun, 11 Mar 2007 14:59:28 +1100 Con Kolivas <[EMAIL PROTECTED]> wrote: > > Bottom line: we've had a _lot_ of problems with the new yield() semantics. > > We effectively broke back-compatibility by changing its behaviour a lot, > > and we can't really turn around and blame application developers for that. > > So... I would take it that's a yes for a recommendation with respect to > implementing a new yield() ? A new scheduler is as good a time as any to do > it. I guess so. We'd, err, need to gather Ingo's input ;) Perhaps a suitable way of doing this would be to characterise then emulate the 2.4 behaviour. As long as it turns out to be vaguely sensible. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
ATA: abnormal status 0x7F on port 0xNNNN since 2.6.20
Hi, Since linux 2.6.20 the kernel log shows at boot time these error. The system are stable, but shows this, that in 2.6.19.N does not show. (please CC to my email, i am currently not subscribe to lkml) Thanks, Linux version 2.6.20.2 ([EMAIL PROTECTED]) (gcc version 3.4.6) #1 PREEMPT Fri Mar 9 21:43:32 ART 2007 BIOS-provided physical RAM map: sanitize start sanitize end copy_e820_map() start: size: 0009fc00 end: 0009fc00 type: 1 copy_e820_map() type is E820_RAM copy_e820_map() start: 0009fc00 size: 0400 end: 000a type: 2 copy_e820_map() start: 000f size: 0001 end: 0010 type: 2 copy_e820_map() start: 0010 size: 1fef end: 1fff type: 1 copy_e820_map() type is E820_RAM copy_e820_map() start: 1fff size: 8000 end: 1fff8000 type: 3 copy_e820_map() start: 1fff8000 size: 8000 end: 2000 type: 4 copy_e820_map() start: fec0 size: 1000 end: fec01000 type: 2 copy_e820_map() start: fee0 size: 1000 end: fee01000 type: 2 copy_e820_map() start: fff8 size: 0008 end: 0001 type: 2 BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - 1fff (usable) BIOS-e820: 1fff - 1fff8000 (ACPI data) BIOS-e820: 1fff8000 - 2000 (ACPI NVS) BIOS-e820: fec0 - fec01000 (reserved) BIOS-e820: fee0 - fee01000 (reserved) BIOS-e820: fff8 - 0001 (reserved) 511MB LOWMEM available. Entering add_active_range(0, 0, 131056) 0 entries of 256 used Zone PFN ranges: DMA 0 -> 4096 Normal 4096 -> 131056 early_node_map[1] active PFN ranges 0:0 -> 131056 On node 0 totalpages: 131056 DMA zone: 32 pages used for memmap DMA zone: 0 pages reserved DMA zone: 4064 pages, LIFO batch:0 Normal zone: 991 pages used for memmap Normal zone: 125969 pages, LIFO batch:31 DMI 2.3 present. ACPI: RSDP (v000 AMI ) @ 0x000fa8c0 ACPI: RSDT (v001 AMIINT VIA_K7 0x0010 MSFT 0x0097) @ 0x1fff ACPI: FADT (v001 AMIINT VIA_K7 0x0011 MSFT 0x0097) @ 0x1fff0030 ACPI: MADT (v001 AMIINT VIA_K7 0x0009 MSFT 0x0097) @ 0x1fff00c0 ACPI: DSDT (v001VIAK7VT4 0x1000 INTL 0x02002024) @ 0x ACPI: PM-Timer IO Port: 0x808 Allocating PCI resources starting at 3000 (gap: 2000:dec0) Detected 1666.250 MHz processor. Built 1 zonelists. Total pages: 130033 Kernel command line: BOOT_IMAGE=2.6.20.2 ro root=305 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Initializing CPU#0 PID hash table entries: 2048 (order: 11, 8192 bytes) Console: colour VGA+ 80x25 Dentry cache hash table entries: 65536 (order: 6, 262144 bytes) Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) Memory: 516100k/524224k available (2321k kernel code, 7572k reserved, 462k data, 144k init, 0k highmem) virtual kernel memory layout: fixmap : 0x8000 - 0xf000 ( 28 kB) vmalloc : 0xe080 - 0x6000 ( 503 MB) lowmem : 0xc000 - 0xdfff ( 511 MB) .init : 0xc03bc000 - 0xc03e ( 144 kB) .data : 0xc0344554 - 0xc03b806c ( 462 kB) .text : 0xc010 - 0xc0344554 (2321 kB) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay using timer specific routine.. 3334.64 BogoMIPS (lpj=1667322) Mount-cache hash table entries: 512 CPU: After generic identify, caps: 0383fbff c1cbfbff CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 256K (64 bytes/line) CPU: After all inits, caps: 0383fbff c1cbfbff 0420 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU: AMD Sempron(tm) 2400+ stepping 01 Checking 'hlt' instruction... OK. ACPI: Core revision 20060707 ACPI: setting ELCR to 0200 (from 0c88) NET: Registered protocol family 16 ACPI: bus type pci registered PCI: PCI BIOS revision 2.10 entry at 0xfda71, last bus=1 PCI: Using configuration type 1 Setting up standard PCI resources ACPI: Interpreter enabled ACPI: Using PIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (:00) PCI: Probing PCI hardware (bus 00) Boot video device is :01:00.0 ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: Power Resource [URP1] (off) ACPI: Power Resource [URP2] (off) ACPI: Power Resource [FDDP] (off) ACPI: Power Resource [LPTP] (off) ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 7 *10 11 12 14 15) ACPI: PCI Interrupt
Re: RSDL-mm 0.28
On Sunday 11 March 2007 14:16, Andrew Morton wrote: > > On Sun, 11 Mar 2007 13:28:22 +1100 "Con Kolivas" <[EMAIL PROTECTED]> > > wrote: Well... are you advocating we change sched_yield semantics to a > > gentler form? > > > >From a practical POV: our present yield() behaviour is so truly awful that > > it's basically always a bug to use it. This probably isn't a good thing. > > So yes, I do think that we should have a rethink and try to come up with > behaviour which is more in accord with what application developers expect > yield() to do. > > otoh, > > a) we should have done this five years ago. Instead, we've spent that >time training userspace programmers to not use yield(), so perhaps >there's little to be gained in changing it now. > > b) if we _were_ to change yield(), people would use it more, and their >applications would of course suck bigtime when run on earlier 2.6 >kernels. > > > Bottom line: we've had a _lot_ of problems with the new yield() semantics. > We effectively broke back-compatibility by changing its behaviour a lot, > and we can't really turn around and blame application developers for that. So... I would take it that's a yes for a recommendation with respect to implementing a new yield() ? A new scheduler is as good a time as any to do it. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 2/6] i386: quicklist support
On Sat, Mar 10, 2007 at 06:09:34PM -0800, Christoph Lameter wrote: > i386: Convert to quicklists > Implement the i386 management of pgd and pmds using quicklists. I approve, though it would be nice if ptes had an interface operating on struct page * to use. On Sat, Mar 10, 2007 at 06:09:34PM -0800, Christoph Lameter wrote: > The i386 management of page table pages currently uses page sized slabs. > The page state is therefore mainly determined by the slab code. However, > i386 also uses its own fields in the page struct to mark special pages > and to build a list of pgds using the ->private and ->index field (yuck!). > This has been finely tuned to work right with SLAB but SLUB needs more > control over the page struct. Currently the only way for SLUB to support > these slabs is through special casing PAGE_SIZE slabs. > If we use quicklists instead then we can avoid the mess, and also the > overhead of manipulating page sized objects through slab. Hey! I did quite well given the constraints under which I was operating. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Use more gcc extensions in the Linux headers
On Sun, 2007-03-11 at 03:58 +0100, Jan Engelhardt wrote: > >-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) > >+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + > >__must_be_array(arr)) > >+ > 80 cols *cough* :) I think your cough added a column? Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL-mm 0.28
> On Sun, 11 Mar 2007 13:28:22 +1100 "Con Kolivas" <[EMAIL PROTECTED]> wrote: > Well... are you advocating we change sched_yield semantics to a > gentler form? >From a practical POV: our present yield() behaviour is so truly awful that it's basically always a bug to use it. This probably isn't a good thing. So yes, I do think that we should have a rethink and try to come up with behaviour which is more in accord with what application developers expect yield() to do. otoh, a) we should have done this five years ago. Instead, we've spent that time training userspace programmers to not use yield(), so perhaps there's little to be gained in changing it now. b) if we _were_ to change yield(), people would use it more, and their applications would of course suck bigtime when run on earlier 2.6 kernels. Bottom line: we've had a _lot_ of problems with the new yield() semantics. We effectively broke back-compatibility by changing its behaviour a lot, and we can't really turn around and blame application developers for that. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: libata-acpi: allow _GTF on SATA, but disable on PATA for now
On Saturday 10 March 2007 06:30, Jeff Garzik wrote: > Linux Kernel Mailing List wrote: > > Gitweb: > > http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=df33c77e3981e71afc8727ee5c432ba1a1bba68c > > Commit: df33c77e3981e71afc8727ee5c432ba1a1bba68c > > Parent: 908e0a8a265fe8057604a9a30aec3f0be7bb5ebb > > Author: Kristen Accardi <[EMAIL PROTECTED]> > > AuthorDate: Fri Mar 9 18:15:33 2007 -0500 > > Committer: Len Brown <[EMAIL PROTECTED]> > > CommitDate: Fri Mar 9 18:15:33 2007 -0500 > > > > libata-acpi: allow _GTF on SATA, but disable on PATA for now > > > > The ACPI specification states, and BIOS implementations depend on, > > _STM being called before _GTF. > > > > SATA does this, but PATA does not. So for now, simply > > prevent execution of _GTF on PATA devices. Longer term we > > should implement ACPI support for PATA devices in libata. > > > > Signed-off-by: Kristen Accardi <[EMAIL PROTECTED]> > > Signed-off-by: Len Brown <[EMAIL PROTECTED]> > > --- > > drivers/ata/libata-acpi.c |7 +++ > > 1 files changed, 7 insertions(+), 0 deletions(-) > > > > diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c > > index d14a48e..89aaf74 100644 > > --- a/drivers/ata/libata-acpi.c > > +++ b/drivers/ata/libata-acpi.c > > @@ -561,6 +561,13 @@ int ata_acpi_exec_tfs(struct ata_port *ap) > > > > if (noacpi) > > return 0; > > + /* > > +* TBD - implement PATA support. For now, > > +* we should not run GTF on PATA devices since some > > +* PATA require execution of GTM/STM before GTF. > > +*/ > > + if (!(ap->cbl == ATA_CBL_SATA)) > > + return 0; > > > > for (ix = 0; ix < ATA_MAX_DEVICES; ix++) { > > if (!ata_dev_enabled(>device[ix])) > > Grumble! > > This /really/ should have gone through me and linux-ide first. Back at you Jeff, This feature /really/ should have never gone upstream in the first place, as this failure was reported and isolated to git-libata-all.patch back in 2.6.20-rc6-mm3: http://bugzilla.kernel.org/show_bug.cgi?id=7907 It then went on to become the most widely reported "ACPI related" regression in the 2.6.21-rc series -- for which ACPI gets smeared. Thank you ATA... > Alan has been actively working on PATA ACPI, and we have been debugging > ACPI issues as well. PLEASE coordinate with the maintainer, when > touching code outside of drivers/acpi! And PLEASE coordinate with the maintainer when invoking methods that provoke errors in other sub-systems! Re: "debugging ACPI issues as well" What issues? Why haven't I see any mention of them on linux-acpi? Coordination and communication is a two-way street, Jeff. > AFAICS this patch went in with zero appearance on LKML or another > related list, until submission. This is /not/ how we do Linux development. I proudly take credit+blame for shipping Kristen's patch with no delay. It did appear on linux-acpi, as do all the patches I ship -- though I admit it was the same day it went upstream. I'm sorry I didn't CC linux-ide -- I'll get that part right next time. However, I believe that late -rc3 is _well_ past the time to be developing new code real-time in the upstream tree; and is instead time to shut the damn thing off and set sights on the next release. If you disagree with me, I'm not going to object when you send a better fix to Linus for 2.6.21-rc4. However, I do request that you first either start responding to bugzilla traffic, or delete your account from bugzilla so that people don't get the false impression that you're paying attention. thanks, -Len - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Use more gcc extensions in the Linux headers
On Mar 11 2007 13:50, Rusty Russell wrote: >On Sat, 2007-03-10 at 02:04 +0100, Jan Engelhardt wrote: >> Getting back at the macro, how would you like to have it merged? > >Well, this is what I sent to Linus and Andrew (many thanks to those who >made appropriately whimsical *or* useful comments): > >diff -r 1ccdf46b0f41 include/linux/compiler-gcc.h >--- a/include/linux/compiler-gcc.h Sat Mar 10 09:55:29 2007 +1100 >+++ b/include/linux/compiler-gcc.h Sat Mar 10 11:06:35 2007 +1100 >@@ -22,6 +22,9 @@ > __asm__ ("" : "=r"(__ptr) : "0"(ptr));\ > (typeof(ptr)) (__ptr + (off)); }) > >+/* [0] degrades to a pointer: a different type from an array */ >+#define __must_be_array(a) \ >+ BUILD_BUG_ON_ZERO(__builtin_types_compatible_p(typeof(a), typeof([0]))) This looks _much_ nicer! (And BUILD_BUG_ON is also appropriately commented.) > > #define inlineinline __attribute__((always_inline)) > #define __inline____inline__ __attribute__((always_inline)) >diff -r 1ccdf46b0f41 include/linux/compiler-intel.h >--- a/include/linux/compiler-intel.h Sat Mar 10 09:55:29 2007 +1100 >+++ b/include/linux/compiler-intel.h Sat Mar 10 11:06:25 2007 +1100 >@@ -21,4 +21,7 @@ > __ptr = (unsigned long) (ptr); \ > (typeof(ptr)) (__ptr + (off)); }) > >+/* Intel ECC compiler doesn't support __builtin_types_compatible_p() */ >+#define __must_be_array(a) 0 >+ > #endif >diff -r 1ccdf46b0f41 include/linux/kernel.h >--- a/include/linux/kernel.h Sat Mar 10 09:55:29 2007 +1100 >+++ b/include/linux/kernel.h Sat Mar 10 11:06:16 2007 +1100 >@@ -35,7 +35,8 @@ extern const char linux_proc_banner[]; > #define ALIGN(x,a)__ALIGN_MASK(x,(typeof(x))(a)-1) > #define __ALIGN_MASK(x,mask) (((x)+(mask))&~(mask)) > >-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) >+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + >__must_be_array(arr)) >+ > #define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f)) > #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d)) > #define roundup(x, y) x) + ((y) - 1)) / (y)) * (y)) 80 cols *cough* :) Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Use more gcc extensions in the Linux headers
On Sat, 2007-03-10 at 02:04 +0100, Jan Engelhardt wrote: > Getting back at the macro, how would you like to have it merged? Well, this is what I sent to Linus and Andrew (many thanks to those who made appropriately whimsical *or* useful comments): diff -r 1ccdf46b0f41 include/linux/compiler-gcc.h --- a/include/linux/compiler-gcc.h Sat Mar 10 09:55:29 2007 +1100 +++ b/include/linux/compiler-gcc.h Sat Mar 10 11:06:35 2007 +1100 @@ -22,6 +22,9 @@ __asm__ ("" : "=r"(__ptr) : "0"(ptr)); \ (typeof(ptr)) (__ptr + (off)); }) +/* [0] degrades to a pointer: a different type from an array */ +#define __must_be_array(a) \ + BUILD_BUG_ON_ZERO(__builtin_types_compatible_p(typeof(a), typeof([0]))) #define inline inline __attribute__((always_inline)) #define __inline__ __inline__ __attribute__((always_inline)) diff -r 1ccdf46b0f41 include/linux/compiler-intel.h --- a/include/linux/compiler-intel.hSat Mar 10 09:55:29 2007 +1100 +++ b/include/linux/compiler-intel.hSat Mar 10 11:06:25 2007 +1100 @@ -21,4 +21,7 @@ __ptr = (unsigned long) (ptr);\ (typeof(ptr)) (__ptr + (off)); }) +/* Intel ECC compiler doesn't support __builtin_types_compatible_p() */ +#define __must_be_array(a) 0 + #endif diff -r 1ccdf46b0f41 include/linux/kernel.h --- a/include/linux/kernel.hSat Mar 10 09:55:29 2007 +1100 +++ b/include/linux/kernel.hSat Mar 10 11:06:16 2007 +1100 @@ -35,7 +35,8 @@ extern const char linux_proc_banner[]; #define ALIGN(x,a) __ALIGN_MASK(x,(typeof(x))(a)-1) #define __ALIGN_MASK(x,mask) (((x)+(mask))&~(mask)) -#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr)) + #define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f)) #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d)) #define roundup(x, y) x) + ((y) - 1)) / (y)) * (y)) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL-mm 0.28
On 11/03/07, Matt Mackall <[EMAIL PROTECTED]> wrote: I've tested -mm2 against -mm2+noyield and -mm2+rsdl+noyield. The noyield patch simply makes the sched_yield syscall return immediately. Xorg and all tests are run at nice 0. Loads: memload: constant memcpy of 16MB buffer execload: constant re-exec of a trivial shell script forkload: constant fork and exit of a trivial shell script make -j 5: hot-cache kernel build without ccache make -j 5 ccache: hot-cache kernel build with ccache Tests: beryl - 3D window manager, wiggle windows, spin desktop, etc. galeon - web browser, rapidly scrolling long web pages by grabbing the scroll bar mp3 - XMMS on a FUSE sshfs over wireless (during all tests) terminal - responsiveness of ssh and local terminal sessions mouse - responsiveness of mouse pointer Results: great = completely smooth good = fully responsive ok = visible latency bad = becomes difficult to use (or mp3 skips) awful = make it stop, please -mm2-mm2+noyield rsdl+noyield no load berylgreat great great galeon goodgood good mp3 goodgood good terminal goodgood good mousegoodgood good memload x10 berylawful/bad great good galeon goodgood ok/good mp3 goodgood good terminal goodgood good mousegoodgood good execload x10 berylawful/bad bad/good good galeon goodbad/good ok/good mp3 goodbadgood terminal goodbad/good good mousegoodbad/good good forkload x10 berylgoodgood great galeon goodgood ok/good mp3 goodgood good terminal goodgood ok/good mousegoodgood good make -j 5 berylok good good/great galeon goodgood ok/good mp3 goodgood good terminal goodgood good mousegoodgood good make -j 5 ccache berylok good awful galeon goodgood bad mp3 goodgood bad terminal goodgood bad/ok mousegoodgood bad/ok make -j 5 real 8m1.857s8m50.659s 8m9.282s user 7m19.127s 8m3.494s 7m30.740s sys 0m30.910s 0m33.722s 0m29.542s make -j 5 ccache real 2m6.182s2m19.032s 2m1.832s user 1m39.466s 1m48.787s 1m37.250s sys 0m19.741s 0m22.993s 0m20.109s Thanks very much for that comprehensive summary and testing! There's a substantial performance hit for not yield, so we probably want to investigate alternate semantics for it. It seems reasonable for apps to say "let me not hog the CPU" without completely expiring them. Imagine you're in the front of the line (aka queue) and you spend a moment fumbling for your wallet. The polite thing to do is to let the next guy in front. But with the current sched_yield, you go all the way to the back of the line. Well... are you advocating we change sched_yield semantics to a gentler form? This is a cinch to implement but I know how Ingo feels about this. It will only encourage more lax coding using sched_yield instead of proper blocking (see huge arguments with the ldap people on this one who insist it's impossible not to use yield). RSDL makes most of the noyield hit back in normal make and then some with ccache. Impressive. But ccache is still destroying interactivity somehow. The ccache effect is fairly visible even with non-parallel 'make'. Ok I don't think there's any actual accounting problem here per se (although I did just recently post a bugfix for rsdl however I think that's unrelated). What I think is going on in the ccache testcase is that all the work is being offloaded to kernel threads reading/writing to/from the filesystem and the make is not getting any actual cpu time. This is "worked around" in mainline thanks to the testing for sleeping on uninterruptible sleep in the interactivity estimator. What I suspect is happening is kernel threads that are running nice -5 are doing all the work on make's behalf in the setting of ccache since it is mostly i/o bound. The reason for -nice values on kernel threads is questionable anyway. Can you try renicing your kernel threads all to nice 0 and see what effect that has? Obviously this doesn't need a recompile, but is simple enough to implement in kthread code as a new default. Also note I could occassionally trigger nasty multi-second pauses with -mm2+noyield under exectest that didn't show up elsewhere. That's probably a bug in the mainline scheduler. Ew. It's probably not a bug but a good example of some of the starvation scenarios we're hitting on mainline (hence the need for a rewrite ;))
Re: [RFC PATCH 1/3] Add ability to keep track of callers of symbol_(get|put)
On Sat, 2007-03-10 at 02:31 -0200, Mauro Carvalho Chehab wrote: > From: Trent Piepho <[EMAIL PROTECTED]> > > When a module uses symbol_get() to increase the ref count of another > module, there is no record what module called symbol_get(). A module > can > show up as having other users, but there is no way to tell who those > users are. Hi Mauro, Interesting: in general you cannot tell who is using a module, but for this case it makes sense. Your patch was linewrapped here, but it all looks fine. Thanks, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 7/9] signalfd/timerfd - timerfd wire up i386 arch ...
This patch wire the timerfd system call to the i386 architecture. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.20.ep2/arch/i386/kernel/syscall_table.S === --- linux-2.6.20.ep2.orig/arch/i386/kernel/syscall_table.S 2007-03-10 15:57:58.0 -0800 +++ linux-2.6.20.ep2/arch/i386/kernel/syscall_table.S 2007-03-10 15:58:08.0 -0800 @@ -320,3 +320,4 @@ .long sys_getcpu .long sys_epoll_pwait .long sys_signalfd /* 320 */ + .long sys_timerfd Index: linux-2.6.20.ep2/include/asm-i386/unistd.h === --- linux-2.6.20.ep2.orig/include/asm-i386/unistd.h 2007-03-10 15:57:58.0 -0800 +++ linux-2.6.20.ep2/include/asm-i386/unistd.h 2007-03-10 15:58:08.0 -0800 @@ -326,10 +326,11 @@ #define __NR_getcpu318 #define __NR_epoll_pwait 319 #define __NR_signalfd 320 +#define __NR_timerfd 321 #ifdef __KERNEL__ -#define NR_syscalls 321 +#define NR_syscalls 322 #define __ARCH_WANT_IPC_PARSE_VERSION #define __ARCH_WANT_OLD_READDIR - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 8/9] signalfd/timerfd - timerfd wire up x86_64 arch ...
This patch wire the timerfd system call to the x86_64 architecture. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.20.ep2/arch/x86_64/ia32/ia32entry.S === --- linux-2.6.20.ep2.orig/arch/x86_64/ia32/ia32entry.S 2007-03-10 15:58:00.0 -0800 +++ linux-2.6.20.ep2/arch/x86_64/ia32/ia32entry.S 2007-03-10 15:58:10.0 -0800 @@ -720,4 +720,5 @@ .quad sys_getcpu .quad sys_epoll_pwait .quad sys_signalfd /* 320 */ + .quad sys_timerfd ia32_syscall_end: Index: linux-2.6.20.ep2/include/asm-x86_64/unistd.h === --- linux-2.6.20.ep2.orig/include/asm-x86_64/unistd.h 2007-03-10 15:58:00.0 -0800 +++ linux-2.6.20.ep2/include/asm-x86_64/unistd.h2007-03-10 15:58:10.0 -0800 @@ -621,8 +621,10 @@ __SYSCALL(__NR_move_pages, sys_move_pages) #define __NR_signalfd 280 __SYSCALL(__NR_signalfd, sys_signalfd) +#define __NR_timerfd 281 +__SYSCALL(__NR_timerfd, sys_timerfd) -#define __NR_syscall_max __NR_signalfd +#define __NR_syscall_max __NR_timerfd #ifndef __NO_STUBS #define __ARCH_WANT_OLD_READDIR - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 9/9] signalfd/timerfd - timerfd compat code ...
This patch implement the necessary compat code for the timerfd system call. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.20.ep2/fs/compat.c === --- linux-2.6.20.ep2.orig/fs/compat.c 2007-03-10 15:58:03.0 -0800 +++ linux-2.6.20.ep2/fs/compat.c2007-03-10 15:58:12.0 -0800 @@ -2257,3 +2257,23 @@ return sys_signalfd(ufd, ksigmask, sizeof(sigset_t)); } + +asmlinkage long compat_sys_timerfd(int ufd, int clockid, int tmrtype, + const struct compat_timespec __user *utmr) +{ + long res; + struct timespec t; + struct timespec __user *ut; + + res = -EFAULT; + if (get_compat_timespec(, utmr)) + goto err_exit; + ut = compat_alloc_user_space(sizeof(*ut)); + if (copy_to_user(ut, , sizeof(t)) ) + goto err_exit; + + res = sys_timerfd(ufd, clockid, tmrtype, ut); +err_exit: + return res; +} + - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 4/9] signalfd/timerfd - signalfd wire up x86_64 arch ...
This patch wire the signalfd system call to the x86_64 architecture. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.20.ep2/include/asm-x86_64/unistd.h === --- linux-2.6.20.ep2.orig/include/asm-x86_64/unistd.h 2007-03-10 15:57:00.0 -0800 +++ linux-2.6.20.ep2/include/asm-x86_64/unistd.h2007-03-10 15:58:00.0 -0800 @@ -619,8 +619,10 @@ __SYSCALL(__NR_vmsplice, sys_vmsplice) #define __NR_move_pages279 __SYSCALL(__NR_move_pages, sys_move_pages) +#define __NR_signalfd 280 +__SYSCALL(__NR_signalfd, sys_signalfd) -#define __NR_syscall_max __NR_move_pages +#define __NR_syscall_max __NR_signalfd #ifndef __NO_STUBS #define __ARCH_WANT_OLD_READDIR Index: linux-2.6.20.ep2/arch/x86_64/ia32/ia32entry.S === --- linux-2.6.20.ep2.orig/arch/x86_64/ia32/ia32entry.S 2007-03-10 15:57:00.0 -0800 +++ linux-2.6.20.ep2/arch/x86_64/ia32/ia32entry.S 2007-03-10 15:58:00.0 -0800 @@ -714,8 +714,10 @@ .quad compat_sys_get_robust_list .quad sys_splice .quad sys_sync_file_range - .quad sys_tee + .quad sys_tee /* 315 */ .quad compat_sys_vmsplice .quad compat_sys_move_pages .quad sys_getcpu -ia32_syscall_end: + .quad sys_epoll_pwait + .quad sys_signalfd /* 320 */ +ia32_syscall_end: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 6/9] signalfd/timerfd - timerfd core ...
This patch introduces a new system call for timers events delivered though file descriptors. This allows timer event to be used with standard POSIX poll(2), select(2) and read(2). As a consequence of supporting the Linux f_op->poll subsystem, they can be used with epoll(2) too. The system call is defined as: int timerfd(int ufd, int clockid, int tmrtype, const struct timespec *utmr); The "ufd" parameter allows for re-use (re-programming) of an existing timerfd w/out going through the close/open cycle (same as signalfd). If "ufd" is -1, s new file descriptor will be created, otherwise the existing "ufd" will be re-programmed. The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME. The "tmrtype" parameter allows to specify the timer type. The following values are supported: TFD_TIMER_REL The time specified in the "utmr" parameter is a relative time from NOW. TFD_TIMER_ABS The timer specified in the "utmr" parameter is an absolute time. TFD_TIMER_SEQ The time specified in the "utmr" parameter is an interval at which a continuous clock rate will be generated. The function returns the new (or same, in case "ufd" is a valid timerfd descriptor) file, or -1 in case of error. As stated before, the timerfd file descriptor supports poll(2), select(2) and epoll(2). When a timer event happened on the timerfd, a POLLIN mask will be returned. The read(2) call can be used, and it will return a u32 variable holding the number of "ticks" that happened on the interface since the last call to read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will be returned if no ticks happened. A quick test program, shows timerfd working correctly on my amd64 box: http://www.xmailserver.org/timerfd-test.c Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.20.ep2/fs/timerfd.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.20.ep2/fs/timerfd.c 2007-03-10 15:58:05.0 -0800 @@ -0,0 +1,268 @@ +/* + * fs/timerfd.c + * + * Copyright (C) 2007 Davide Libenzi + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + + + +struct timerfd_ctx { + struct hrtimer tmr; + int clockid; + ktime_t tval; + int tmrtype; + spinlock_t lock; + wait_queue_head_t wqh; + unsigned long ticks; +}; + + +static int timerfd_tmrproc(struct hrtimer *htmr); +static void timerfd_cleanup(struct timerfd_ctx *ctx); +static int timerfd_close(struct inode *inode, struct file *file); +static unsigned int timerfd_poll(struct file *file, poll_table *wait); +static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count, + loff_t *ppos); + + + +static const struct file_operations timerfd_fops = { + .release= timerfd_close, + .poll = timerfd_poll, + .read = timerfd_read, +}; +static struct kmem_cache *timerfd_ctx_cachep; + + + +static int timerfd_tmrproc(struct hrtimer *htmr) +{ + struct timerfd_ctx *ctx = container_of(htmr, struct timerfd_ctx, tmr); + int rval = HRTIMER_NORESTART; + unsigned long flags; + + spin_lock_irqsave(>lock, flags); + ctx->ticks++; + wake_up_locked(>wqh); + if (ctx->tmrtype == TFD_TIMER_SEQ) { + hrtimer_forward(htmr, htmr->base->softirq_time, ctx->tval); + rval = HRTIMER_RESTART; + } + spin_unlock_irqrestore(>lock, flags); + + return rval; +} + + +asmlinkage long sys_timerfd(int ufd, int clockid, int tmrtype, + const struct timespec __user *utmr) +{ + int error; + struct timerfd_ctx *ctx; + struct file *file; + struct inode *inode; + ktime_t tval, tnow; + struct timespec ktmr, tmrnow; + + error = -EFAULT; + if (copy_from_user(, utmr, sizeof(ktmr))) + goto err_exit; + + tval = timespec_to_ktime(ktmr); + error = -EINVAL; + if (clockid != CLOCK_MONOTONIC && + clockid != CLOCK_REALTIME) + goto err_exit; + switch (tmrtype) { + case TFD_TIMER_REL: + case TFD_TIMER_SEQ: + break; + case TFD_TIMER_ABS: + getnstimeofday(); + tnow = timespec_to_ktime(tmrnow); + if (ktime_to_ns(tval) <= ktime_to_ns(tnow)) + goto err_exit; + tval = ktime_sub(tval, tnow); + break; + default: + goto err_exit; + } + + if (ufd == -1) { + error = -ENOMEM; + ctx = kmem_cache_alloc(timerfd_ctx_cachep, GFP_KERNEL); + if (!ctx) + goto err_exit; + +
[patch 5/9] signalfd/timerfd - signalfd compat code ...
This patch implement the necessary compat code for the signalfd system call. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.20.ep2/fs/compat.c === --- linux-2.6.20.ep2.orig/fs/compat.c 2007-03-10 15:57:00.0 -0800 +++ linux-2.6.20.ep2/fs/compat.c2007-03-10 15:58:03.0 -0800 @@ -46,6 +46,7 @@ #include #include #include +#include #include #include @@ -2235,3 +2236,24 @@ return sys_ni_syscall(); } #endif + +asmlinkage long compat_sys_signalfd(int ufd, + const compat_sigset_t __user *sigmask, + compat_size_t sigsetsize) +{ + compat_sigset_t ss32; + sigset_t tmp; + sigset_t __user *ksigmask; + + if (sigsetsize != sizeof(compat_sigset_t)) + return -EINVAL; + if (copy_from_user(, sigmask, sizeof(ss32))) + return -EFAULT; + sigset_from_compat(, ); + ksigmask = compat_alloc_user_space(sizeof(sigset_t)); + if (copy_to_user(ksigmask, , sizeof(sigset_t))) + return -EFAULT; + + return sys_signalfd(ufd, ksigmask, sizeof(sigset_t)); +} + - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/9] signalfd/timerfd - anonymous inode source ...
This patch add an anonymous inode source, to be used for files that need and inode only in order to create a file*. We do not care of having an inode for each file, and we do not even care of having different names in the associated dentries (dentry names will be same for classes of file*). This allow code reuse, and will be used by epoll, signalfd and timerfd (and whatever else there'll be). Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.20.ep2/fs/anon_inodes.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.20.ep2/fs/anon_inodes.c 2007-03-10 15:57:47.0 -0800 @@ -0,0 +1,203 @@ +/* + * fs/anon_inodes.c + * + * Copyright (C) 2007 Davide Libenzi + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + + + +static int ainofs_delete_dentry(struct dentry *dentry); +static struct inode *aino_getinode(void); +static struct inode *aino_mkinode(void); +static int ainofs_get_sb(struct file_system_type *fs_type, int flags, +const char *dev_name, void *data, struct vfsmount *mnt); + + + +static struct vfsmount *aino_mnt __read_mostly; +static struct inode *aino_inode; +static struct file_operations aino_fops = { }; +static struct file_system_type aino_fs_type = { + .name = "ainofs", + .get_sb = ainofs_get_sb, + .kill_sb= kill_anon_super, +}; +static struct dentry_operations ainofs_dentry_operations = { + .d_delete = ainofs_delete_dentry, +}; + + + +int aino_getfd(int *pfd, struct inode **pinode, struct file **pfile, + char const *name, const struct file_operations *fops, void *priv) +{ + struct qstr this; + struct dentry *dentry; + struct inode *inode; + struct file *file; + int error, fd; + + error = -ENFILE; + file = get_empty_filp(); + if (!file) + goto eexit_1; + + inode = aino_getinode(); + if (IS_ERR(inode)) { + error = PTR_ERR(inode); + goto eexit_2; + } + + error = get_unused_fd(); + if (error < 0) + goto eexit_3; + fd = error; + + /* +* Link the inode to a directory entry by creating a unique name +* using the inode sequence number. +*/ + error = -ENOMEM; + this.name = name; + this.len = strlen(name); + this.hash = 0; + dentry = d_alloc(aino_mnt->mnt_sb->s_root, ); + if (!dentry) + goto eexit_4; + dentry->d_op = _dentry_operations; + /* Do not publish this dentry inside the global dentry hash table */ + dentry->d_flags &= ~DCACHE_UNHASHED; + d_instantiate(dentry, inode); + + file->f_path.mnt = mntget(aino_mnt); + file->f_path.dentry = dentry; + file->f_mapping = inode->i_mapping; + + file->f_pos = 0; + file->f_flags = O_RDONLY; + file->f_op = fops; + file->f_mode = FMODE_READ; + file->f_version = 0; + file->private_data = priv; + + fd_install(fd, file); + + *pfd = fd; + *pinode = inode; + *pfile = file; + return 0; + +eexit_4: + put_unused_fd(fd); +eexit_3: + iput(inode); +eexit_2: + put_filp(file); +eexit_1: + return error; +} + + +static int ainofs_delete_dentry(struct dentry *dentry) +{ + /* +* We faked vfs to believe the dentry was hashed when we created it. +* Now we restore the flag so that dput() will work correctly. +*/ + dentry->d_flags |= DCACHE_UNHASHED; + return 1; +} + + +static struct inode *aino_getinode(void) +{ + return igrab(aino_inode); +} + + +/* + * A single inode exist for all aino files. On the contrary of pipes, + * aino inodes has no per-instance data associated, so we can avoid + * the allocation of multiple of them. + */ +static struct inode *aino_mkinode(void) +{ + int error = -ENOMEM; + struct inode *inode = new_inode(aino_mnt->mnt_sb); + + if (!inode) + goto eexit_1; + + inode->i_fop = _fops; + + /* +* Mark the inode dirty from the very beginning, +* that way it will never be moved to the dirty +* list because mark_inode_dirty() will think +* that it already _is_ on the dirty list. +*/ + inode->i_state = I_DIRTY; + inode->i_mode = S_IRUSR | S_IWUSR; + inode->i_uid = current->fsuid; + inode->i_gid = current->fsgid; + inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; + return inode; + +eexit_1: + return ERR_PTR(error); +} + + +static int ainofs_get_sb(struct file_system_type *fs_type, int flags, +const char *dev_name, void *data, struct vfsmount *mnt) +{ + return get_sb_pseudo(fs_type, "aino:", NULL, AINOFS_MAGIC, mnt); +} + +
[patch 2/9] signalfd/timerfd - signalfd core ...
This patch series implements the new signalfd() system call. I took part of the original Linus code (and you know how badly it can be broken :), and I added even more breakage ;) Signals are fetched from the same signal queue used by the process, so signalfd will compete with standard kernel delivery in dequeue_signal(). If you want to reliably fetch signals on the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This seems to be working fine on my Dual Opteron machine. I made a quick test program for it: http://www.xmailserver.org/signafd-test.c The signalfd() system call implements signal delivery into a file descriptor receiver. The signalfd file descriptor if created with the following API: int signalfd(int ufd, const sigset_t *mask, size_t masksize); The "ufd" parameter allows to change an existing signalfd sigmask, w/out going to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new signalfd file. The "mask" allows to specify the signal mask of signals that we are interested in. The "masksize" parameter is the size of "mask". The signalfd fd supports the poll(2) and read(2) system calls. The poll(2) will return POLLIN when signals are available to be dequeued. As a direct consequence of supporting the Linux poll subsystem, the signalfd fd can use used together with epoll(2) too. The read(2) system call will return a "struct signalfd_siginfo" structure in the userspace supplied buffer. The return value is the number of bytes copied in the supplied buffer, or -1 in case of error. The read(2) call can also return 0, in case the sighand structure to which the signalfd was attached, has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will return -EAGAIN in case no signal is available. The format of the struct signalfd_siginfo is, and the valid fields depends of the (->code & __SI_MASK) value, in the same way a struct siginfo would: struct signalfd_siginfo { __u32 signo;/* si_signo */ __s32 err; /* si_errno */ __s32 code; /* si_code */ __u32 pid; /* si_pid */ __u32 uid; /* si_uid */ __s32 fd; /* si_fd */ __u32 tid; /* si_fd */ __u32 band; /* si_band */ __u32 overrun; /* si_overrun */ __u32 trapno; /* si_trapno */ __s32 status; /* si_status */ __s32 svint;/* si_int */ __u64 svptr;/* si_ptr */ __u64 utime;/* si_utime */ __u64 stime;/* si_stime */ __u64 addr; /* si_addr */ }; Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.20.ep2/fs/signalfd.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.20.ep2/fs/signalfd.c 2007-03-10 15:57:51.0 -0800 @@ -0,0 +1,381 @@ +/* + * fs/signalfd.c + * + * Copyright (C) 2003 Linus Torvalds + * + * Mon Mar 5, 2007: Davide Libenzi + * Changed ->read() to return a siginfo strcture instead of signal number. + * Fixed locking in ->poll(). + * Added sighand-detach notification. + * Added fd re-use in sys_signalfd() syscall. + * Now using anonymous inode source. + * Thanks to Oleg Nesterov for useful code review and suggestions. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + + + +struct signalfd_ctx { + struct list_head lnk; + wait_queue_head_t wqh; + sigset_t sigmask; + struct task_struct *tsk; +}; + + + +static struct sighand_struct *signalfd_get_sighand(struct signalfd_ctx *ctx, + unsigned long *flags); +static void signalfd_put_sighand(struct signalfd_ctx *ctx, +struct sighand_struct *sighand, +unsigned long *flags); +static void signalfd_cleanup(struct signalfd_ctx *ctx); +static int signalfd_close(struct inode *inode, struct file *file); +static unsigned int signalfd_poll(struct file *file, poll_table *wait); +static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo, +siginfo_t const *kinfo); +static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count, +loff_t *ppos); + + + +static const struct file_operations signalfd_fops = { + .release= signalfd_close, + .poll = signalfd_poll, + .read = signalfd_read, +}; +static struct kmem_cache *signalfd_ctx_cachep; + + + +static struct sighand_struct *signalfd_get_sighand(struct signalfd_ctx *ctx, + unsigned long *flags) +{ + struct sighand_struct *sighand; + + rcu_read_lock(); + sighand = lock_task_sighand(ctx->tsk, flags); + rcu_read_unlock(); + + if (sighand && list_empty(>lnk)) { +
[patch 3/9] signalfd/timerfd - signalfd wire up i386 arch ...
This patch wire the signalfd system call to the i386 architecture. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.20.ep2/arch/i386/kernel/syscall_table.S === --- linux-2.6.20.ep2.orig/arch/i386/kernel/syscall_table.S 2007-03-10 15:57:00.0 -0800 +++ linux-2.6.20.ep2/arch/i386/kernel/syscall_table.S 2007-03-10 15:57:58.0 -0800 @@ -319,3 +319,4 @@ .long sys_move_pages .long sys_getcpu .long sys_epoll_pwait + .long sys_signalfd /* 320 */ Index: linux-2.6.20.ep2/include/asm-i386/unistd.h === --- linux-2.6.20.ep2.orig/include/asm-i386/unistd.h 2007-03-10 15:57:00.0 -0800 +++ linux-2.6.20.ep2/include/asm-i386/unistd.h 2007-03-10 15:57:58.0 -0800 @@ -325,10 +325,11 @@ #define __NR_move_pages317 #define __NR_getcpu318 #define __NR_epoll_pwait 319 +#define __NR_signalfd 320 #ifdef __KERNEL__ -#define NR_syscalls 320 +#define NR_syscalls 321 #define __ARCH_WANT_IPC_PARSE_VERSION #define __ARCH_WANT_OLD_READDIR - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[QUICKLIST 3/6] i386: Use standard list manipulators for pgd_list
i386: Use standard list macros. Get rid of generating a list via page->index and page->private. Use page->lru instead. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc3/arch/i386/mm/pgtable.c === --- linux-2.6.21-rc3.orig/arch/i386/mm/pgtable.c2007-03-10 17:42:08.0 -0800 +++ linux-2.6.21-rc3/arch/i386/mm/pgtable.c 2007-03-10 17:44:23.0 -0800 @@ -213,31 +213,12 @@ struct page *pte_alloc_one(struct mm_str * -- wli */ DEFINE_SPINLOCK(pgd_lock); -struct page *pgd_list; - -static inline void pgd_list_add(pgd_t *pgd) -{ - struct page *page = virt_to_page(pgd); - page->index = (unsigned long)pgd_list; - if (pgd_list) - set_page_private(pgd_list, (unsigned long)>index); - pgd_list = page; - set_page_private(page, (unsigned long)_list); -} - -static inline void pgd_list_del(pgd_t *pgd) -{ - struct page *next, **pprev, *page = virt_to_page(pgd); - next = (struct page *)page->index; - pprev = (struct page **)page_private(page); - *pprev = next; - if (next) - set_page_private(next, (unsigned long)pprev); -} +LIST_HEAD(pgd_list); void pgd_ctor(void *pgd) { unsigned long flags; + struct page *page = virt_to_page(pgd); if (PTRS_PER_PMD == 1) { memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t)); @@ -256,7 +237,7 @@ void pgd_ctor(void *pgd) __pa(swapper_pg_dir) >> PAGE_SHIFT, USER_PTRS_PER_PGD, PTRS_PER_PGD - USER_PTRS_PER_PGD); - pgd_list_add(pgd); + list_add(>lru, _list); spin_unlock_irqrestore(_lock, flags); } @@ -264,10 +245,11 @@ void pgd_ctor(void *pgd) void pgd_dtor(void *pgd) { unsigned long flags; /* can be called from interrupt context */ + struct page *page = virt_to_page(pgd); paravirt_release_pd(__pa(pgd) >> PAGE_SHIFT); spin_lock_irqsave(_lock, flags); - pgd_list_del(pgd); + list_del(>lru); spin_unlock_irqrestore(_lock, flags); } Index: linux-2.6.21-rc3/include/asm-i386/pgtable.h === --- linux-2.6.21-rc3.orig/include/asm-i386/pgtable.h2007-03-10 17:41:48.0 -0800 +++ linux-2.6.21-rc3/include/asm-i386/pgtable.h 2007-03-10 17:42:00.0 -0800 @@ -39,7 +39,7 @@ extern pgd_t swapper_pg_dir[1024]; void check_pgt_cache(void); extern spinlock_t pgd_lock; -extern struct page *pgd_list; +extern struct list_head pgd_list; static inline void pgtable_cache_init(void) {}; void paging_init(void); Index: linux-2.6.21-rc3/arch/i386/mm/fault.c === --- linux-2.6.21-rc3.orig/arch/i386/mm/fault.c 2007-03-10 17:48:04.0 -0800 +++ linux-2.6.21-rc3/arch/i386/mm/fault.c 2007-03-10 17:49:30.0 -0800 @@ -608,11 +608,10 @@ void vmalloc_sync_all(void) struct page *page; spin_lock_irqsave(_lock, flags); - for (page = pgd_list; page; page = - (struct page *)page->index) + list_for_each_entry(page, _list, lru) if (!vmalloc_sync_one(page_address(page), address)) { - BUG_ON(page != pgd_list); + BUG(); break; } spin_unlock_irqrestore(_lock, flags); Index: linux-2.6.21-rc3/arch/i386/mm/pageattr.c === --- linux-2.6.21-rc3.orig/arch/i386/mm/pageattr.c 2007-03-10 17:49:44.0 -0800 +++ linux-2.6.21-rc3/arch/i386/mm/pageattr.c2007-03-10 17:50:14.0 -0800 @@ -95,7 +95,7 @@ static void set_pmd_pte(pte_t *kpte, uns return; spin_lock_irqsave(_lock, flags); - for (page = pgd_list; page; page = (struct page *)page->index) { + list_for_each_entry(page, _list, lru) { pgd_t *pgd; pud_t *pud; pmd_t *pmd; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[SLUB 0/3] SLUB: The unqueued slab allocator V5
[PATCH] SLUB The unqueued slab allocator v4 V4->V5: - Single object slabs only for slabs > slub_max_order otherwise generate sufficient objects to avoid frequent use of the page allocator. This is necessary to compensate for fragmentation caused by frequent uses of the page allocator. We expect slabs of PAGE_SIZE from this rule since multi object slabs require uses of fields that are in use on i386 and x86_64. See the quicklist patchset for a way to fix that issue and a patch to get rid of the PAGE_SIZE special casing. - Drop pass through to page allocator due to page allocator fragmenting memory. The buffering through large order allocations is done in SLUB. Infrequent larger order allocations cause less fragmentation than frequent small order allocations. - We need to update object sizes when merging slabs otherwise kzalloc will not initialize the full object (this caused the failure on varios platforms). - Padding checks before redzone checks so that we get messages about the corruption of whole slab and not about a single object. Note that SLUB will warn on zero sized allocations. SLAB just allocates some memory. So some traces from the usb subsystem etc should be expected. Note that the definition of the return type of ksize() is currently different between mm and Linus tree. Patch is conforming to mm. V3->V4 - Rename /proc/slabinfo to /proc/slubinfo. We have a different format after all. - More bug fixes and stabilization of diagnostic functions. This seems to be finally something that works wherever we test it. - Serialize kmem_cache_create and kmem_cache_destroy via slub_lock (Adrian's idea) - Add two new modifications (separate patches) to guarantee a mininum number of objects per slab and to pass through large allocations. V2->V3 - Debugging and diagnostic support. This is runtime enabled and not compile time enabled. Runtime debugging can be controlled via kernel boot options on an individual slab cache basis or globally. - Slab Trace support (For individual slab caches). - Resiliency support: If basic sanity checks are enabled (via F f.e.) (boot option) then SLUB will do the best to perform diagnostics and then continue (i.e. mark corrupted objects as used). - Fix up numerous issues including clash of SLUBs use of page flags with i386 arch use for pmd and pgds (which are managed as slab caches, sigh). - Dynamic per CPU array sizing. - Explain SLUB slabcache flags V1->V2 - Fix up various issues. Tested on i386 UP, X86_64 SMP, ia64 NUMA. - Provide NUMA support by splitting partial lists per node. - Better Slab cache merge support (now at around 50% of slabs) - List slab cache aliases if slab caches are merged. - Updated descriptions /proc/slabinfo output This is a new slab allocator which was motivated by the complexity of the existing code in mm/slab.c. It attempts to address a variety of concerns with the existing implementation. A. Management of object queues A particular concern was the complex management of the numerous object queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for each allocating CPU and use objects from a slab directly instead of queueing them up. B. Storage overhead of object queues SLAB Object queues exist per node, per CPU. The alien cache queue even has a queue array that contain a queue for each processor on each node. For very large systems the number of queues and the number of objects that may be caught in those queues grows exponentially. On our systems with 1k nodes / processors we have several gigabytes just tied up for storing references to objects for those queues This does not include the objects that could be on those queues. One fears that the whole memory of the machine could one day be consumed by those queues. C. SLAB meta data overhead SLAB has overhead at the beginning of each slab. This means that data cannot be naturally aligned at the beginning of a slab block. SLUB keeps all meta data in the corresponding page_struct. Objects can be naturally aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte boundaries and can fit tightly into a 4k page with no bytes left over. SLAB cannot do this. D. SLAB has a complex cache reaper SLUB does not need a cache reaper for UP systems. On SMP systems the per CPU slab may be pushed back into partial list but that operation is simple and does not require an iteration over a list of objects. SLAB expires per CPU, shared and alien object queues during cache reaping which may cause strange hold offs. E. SLAB has complex NUMA policy layer support SLUB pushes NUMA policy handling into the page allocator. This means that allocation is coarser (SLUB does interleave on a page level) but that situation was also present before 2.6.13. SLABs application of policies to individual slab objects allocated in SLAB is certainly a performance concern
[SLUB 2/3] Enable poisoning for RCU and constructors
Enable poisoning / redzoning for slabs with constructors or SLAB_DEWSTROY_BY_RCU We cannot poison the object itself but we can poison padding spaces and do the redzoning. For that we introduce another flag controlling object poisoning. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc3/mm/slub.c === --- linux-2.6.21-rc3.orig/mm/slub.c 2007-03-09 21:13:02.0 -0800 +++ linux-2.6.21-rc3/mm/slub.c 2007-03-09 21:13:44.0 -0800 @@ -80,6 +80,9 @@ #define ARCH_SLAB_MINALIGN sizeof(void *) #endif +/* Internal SLUB flags */ +#define __OBJECT_POISON 0x8000 /* Poison object */ + static int kmem_size = sizeof(struct kmem_cache); #ifdef CONFIG_SMP @@ -247,8 +250,8 @@ if (s->objects == 1) return; - if (s->flags & SLAB_POISON) { - memset(p, POISON_FREE, s->objsize -1); + if (s->flags & __OBJECT_POISON) { + memset(p, POISON_FREE, s->objsize - 1); p[s->objsize -1] = POISON_END; } @@ -388,7 +391,8 @@ object_err(s, page, p, "Alignment padding check fails"); if (s->flags & SLAB_POISON) { - if (!active && (!check_bytes(p, POISON_FREE, s->objsize - 1) || + if (!active && (s->flags & __OBJECT_POISON) + && (!check_bytes(p, POISON_FREE, s->objsize - 1) || p[s->objsize -1] != POISON_END)) { object_err(s, page, p, "Poison"); return 0; @@ -1371,14 +1375,9 @@ strncmp(slub_debug_slabs, name, strlen(slub_debug_slabs)) == 0)) flags |= slub_debug; - if ((flags & SLAB_POISON) &&((flags & SLAB_DESTROY_BY_RCU) || - ctor || dtor)) { - if (!(slub_debug & SLAB_POISON)) - printk(KERN_WARNING "SLUB %s: Clearing SLAB_POISON " - "because de/constructor exists.\n", - s->name); - flags &= ~SLAB_POISON; - } + if ((flags & SLAB_POISON) && !(flags & SLAB_DESTROY_BY_RCU) && + !ctor && !dtor) + flags |= __OBJECT_POISON; tentative_size = ALIGN(size, calculate_alignment(align, flags)); @@ -1389,7 +1388,7 @@ */ if (size == PAGE_SIZE) flags &= ~(SLAB_RED_ZONE| SLAB_DEBUG_FREE | \ - SLAB_STORE_USER | SLAB_POISON); + SLAB_STORE_USER | SLAB_POISON | __OBJECT_POISON); s->name = name; s->ctor = ctor; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[QUICKLIST 5/6] x86_64: Separate quicklist for pgds
x86_64: Add quicklist for pgd. A second quicklist is useful to separate out PGD handling. We can carry the initialized pgds over to the next process needing them. This avoids the zeroing of the pgds on free that we had to introduce in the last patch. Also clean up the pgd_list handling to use regular list macros. There is no need anymore to avoid the lru field. Move the add/removal of the pgds to the pgdlist into the constructor / destructor. That way the implementation is congruent with i386. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc3/arch/x86_64/Kconfig === --- linux-2.6.21-rc3.orig/arch/x86_64/Kconfig 2007-03-10 14:00:52.0 -0800 +++ linux-2.6.21-rc3/arch/x86_64/Kconfig2007-03-10 14:00:53.0 -0800 @@ -58,7 +58,7 @@ config NR_QUICK int - default 1 + default 2 config ISA bool Index: linux-2.6.21-rc3/arch/x86_64/mm/fault.c === --- linux-2.6.21-rc3.orig/arch/x86_64/mm/fault.c2007-03-10 14:00:29.0 -0800 +++ linux-2.6.21-rc3/arch/x86_64/mm/fault.c 2007-03-10 14:00:53.0 -0800 @@ -585,7 +585,7 @@ } DEFINE_SPINLOCK(pgd_lock); -struct page *pgd_list; +LIST_HEAD(pgd_list); void vmalloc_sync_all(void) { @@ -605,8 +605,7 @@ if (pgd_none(*pgd_ref)) continue; spin_lock(_lock); - for (page = pgd_list; page; -page = (struct page *)page->index) { + list_for_each_entry(page, _list, lru) { pgd_t *pgd; pgd = (pgd_t *)page_address(page) + pgd_index(address); if (pgd_none(*pgd)) Index: linux-2.6.21-rc3/include/asm-x86_64/pgalloc.h === --- linux-2.6.21-rc3.orig/include/asm-x86_64/pgalloc.h 2007-03-10 14:00:52.0 -0800 +++ linux-2.6.21-rc3/include/asm-x86_64/pgalloc.h 2007-03-10 14:00:53.0 -0800 @@ -7,6 +7,9 @@ #include #include +#define QUICK_PGD 0/* We preserve special mappings over free */ +#define QUICK_PT 1 /* Other page table pages that are zero on free */ + #define pmd_populate_kernel(mm, pmd, pte) \ set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte))) #define pud_populate(mm, pud, pmd) \ @@ -22,88 +25,77 @@ static inline void pmd_free(pmd_t *pmd) { BUG_ON((unsigned long)pmd & (PAGE_SIZE-1)); - quicklist_free(0, NULL, pmd); + quicklist_free(QUICK_PT, NULL, pmd); } static inline pmd_t *pmd_alloc_one (struct mm_struct *mm, unsigned long addr) { - return (pmd_t *)quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL); + return (pmd_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL); } static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr) { - return (pud_t *)quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL); + return (pud_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL); } static inline void pud_free (pud_t *pud) { BUG_ON((unsigned long)pud & (PAGE_SIZE-1)); - quicklist_free(0, NULL, pud); + quicklist_free(QUICK_PT, NULL, pud); } -static inline void pgd_list_add(pgd_t *pgd) +static inline void pgd_ctor(void *x) { + unsigned boundary; + pgd_t *pgd = x; struct page *page = virt_to_page(pgd); + /* +* Copy kernel pointers in from init. +*/ + boundary = pgd_index(__PAGE_OFFSET); + memcpy(pgd + boundary, + init_level4_pgt + boundary, + (PTRS_PER_PGD - boundary) * sizeof(pgd_t)); + spin_lock(_lock); - page->index = (pgoff_t)pgd_list; - if (pgd_list) - pgd_list->private = (unsigned long)>index; - pgd_list = page; - page->private = (unsigned long)_list; + list_add(>lru, _list); spin_unlock(_lock); } -static inline void pgd_list_del(pgd_t *pgd) +static inline void pgd_dtor(void *x) { - struct page *next, **pprev, *page = virt_to_page(pgd); + pgd_t *pgd = x; + struct page *page = virt_to_page(pgd); spin_lock(_lock); - next = (struct page *)page->index; - pprev = (struct page **)page->private; - *pprev = next; - if (next) - next->private = (unsigned long)pprev; + list_del(>lru); spin_unlock(_lock); } + static inline pgd_t *pgd_alloc(struct mm_struct *mm) { - unsigned boundary; - pgd_t *pgd = (pgd_t *)quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL); - if (!pgd) - return NULL; + pgd_t *pgd = (pgd_t *)quicklist_alloc(QUICK_PGD, +GFP_KERNEL|__GFP_REPEAT, pgd_ctor); - pgd_list_add(pgd); - /* -
Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...
On Sat, 2007-03-10 at 17:57 -0800, Davide Libenzi wrote: > On Sat, 10 Mar 2007, Nicholas Miell wrote: > > > If that's the goal, somebody should start thinking about reducing the > > contents of struct file to the bare minimum (i.e. not much more than a > > file_operations pointer). > > That's already pretty smal, and the single inode (and maybe dentry) will > make it even smaller. Unless you want to create brazillions of signalfds, > timerfds or asyncfds. > Timers don't need dentry or inode pointers or readahead state, etc., do they? (Beyond the existing VFS expectation, that is.) > > > And the real point of the whole signalfd() is that there really *are* a > > > lot of UNIX interfaces that basically only work with file descriptors. > > > Not > > > just read, but select/poll/epoll. > > > > It'd be useful if the polling interfaces could return small datums > > beyond just the POLL* flags -- having to do a read on timerfd just to > > get the overrun count has a lot of overhead for just an integer, and I > > imagine other things would like to pass back stuff too. > ... > > > You still want timeouts, creating/setting/destroying at timer just for > > a single call to select/poll/epoll is probably too heavy weight. > > Take a look at what timerfd does and what posix timers has to do to > implement the interface. You'll prolly stop trolling with things like "a > lot of overhead" or "too heavy weight". That wasn't a troll. I was talking about the timerfd()/close() overhead and the corresponding bookkeeping necessary to keep that fd around compared to just passing a struct timespec to poll or a millisecond count to epoll_wait. > > timerfd() still leaves out the basic clock selection functionality > > provided by both setitimer() and timer_create(). > > That is coming as soon as I fixed my send-serie script ... Nice. -- Nicholas Miell <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[SLUB 3/3] Configurable slub_max_order
Add slub_max_order Avoid slabs getting to large. Do no longer enforce slub_min_objects if the slab gets bigger than slub_max_order. I am not sure if we really want this. Maybe we should make the selection of the base page size depending on page allocator defrag behavior? I.e. try to restrict allocations to order 0 and order 2 so that can limit fragmentation? Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc3/mm/slub.c === --- linux-2.6.21-rc3.orig/mm/slub.c 2007-03-10 13:14:06.0 -0800 +++ linux-2.6.21-rc3/mm/slub.c 2007-03-10 13:14:11.0 -0800 @@ -1211,6 +1211,7 @@ static __always_inline struct page *get_ * take the list_lock. */ static int slub_min_order = 0; +static int slub_max_order = 4; /* * Minimum number of objects per slab. This is necessary in order to @@ -1249,7 +1250,11 @@ static int calculate_order(int size) order < MAX_ORDER; order++) { unsigned long slab_size = PAGE_SIZE << order; - if (slab_size < slub_min_objects * size) + if (slub_max_order > order && + slab_size < slub_min_objects * size) + continue; + + if (slab_size < size) continue; rem = slab_size % size; @@ -1637,6 +1642,15 @@ static int __init setup_slub_min_order(c __setup("slub_min_order=", setup_slub_min_order); +static int __init setup_slub_max_order(char *str) +{ + get_option (, _max_order); + + return 1; +} + +__setup("slub_max_order=", setup_slub_max_order); + static int __init setup_slub_min_objects(char *str) { get_option (, _min_objects); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[QUICKLIST 6/6] slub: remove special casing for PAGE_SIZE slabs
Slub: Remove special casing for page sized slabs After we have used quicklist so that arches can avoid using the slab allocator to manage page table pages we can now remove the special casing from slub. This is against SLUB V5 Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc3/mm/slub.c === --- linux-2.6.21-rc3.orig/mm/slub.c 2007-03-09 21:23:39.0 -0800 +++ linux-2.6.21-rc3/mm/slub.c 2007-03-09 21:24:23.0 -0800 @@ -1236,16 +1236,6 @@ int order; int rem; - /* -* If this is an order 0 page then there are no issues with -* fragmentation. We can then create a slab with a single object. -* We need this to support the i386 arch code that uses our -* freelist field (index field) for a list pointer. We neveri -* touch the freelist pointer if we just have one object -*/ - if (size == PAGE_SIZE) - return 0; - for (order = max(slub_min_order, fls(size - 1) - PAGE_SHIFT); order < MAX_ORDER; order++) { unsigned long slab_size = PAGE_SIZE << order; @@ -1386,15 +1376,6 @@ tentative_size = ALIGN(size, calculate_alignment(align, flags)); - /* -* PAGE_SIZE slabs are special in that they are passed through -* to the page allocator. Do not do any debugging in order to avoid -* increasing the size of the object. -*/ - if (size == PAGE_SIZE) - flags &= ~(SLAB_RED_ZONE| SLAB_DEBUG_FREE | \ - SLAB_STORE_USER | SLAB_POISON | __OBJECT_POISON); - s->name = name; s->ctor = ctor; s->dtor = dtor; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[QUICKLIST 2/6] i386: quicklist support
i386: Convert to quicklists Implement the i386 management of pgd and pmds using quicklists. The i386 management of page table pages currently uses page sized slabs. The page state is therefore mainly determined by the slab code. However, i386 also uses its own fields in the page struct to mark special pages and to build a list of pgds using the ->private and ->index field (yuck!). This has been finely tuned to work right with SLAB but SLUB needs more control over the page struct. Currently the only way for SLUB to support these slabs is through special casing PAGE_SIZE slabs. If we use quicklists instead then we can avoid the mess, and also the overhead of manipulating page sized objects through slab. It also allows us to use standard list manipulation macros for the pgd list using page->lru thereby simplifying the code. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc3/arch/i386/mm/init.c === --- linux-2.6.21-rc3.orig/arch/i386/mm/init.c 2007-03-10 13:13:32.0 -0800 +++ linux-2.6.21-rc3/arch/i386/mm/init.c2007-03-10 13:39:23.0 -0800 @@ -695,31 +695,6 @@ int remove_memory(u64 start, u64 size) EXPORT_SYMBOL_GPL(remove_memory); #endif -struct kmem_cache *pgd_cache; -struct kmem_cache *pmd_cache; - -void __init pgtable_cache_init(void) -{ - if (PTRS_PER_PMD > 1) { - pmd_cache = kmem_cache_create("pmd", - PTRS_PER_PMD*sizeof(pmd_t), - PTRS_PER_PMD*sizeof(pmd_t), - 0, - pmd_ctor, - NULL); - if (!pmd_cache) - panic("pgtable_cache_init(): cannot create pmd cache"); - } - pgd_cache = kmem_cache_create("pgd", - PTRS_PER_PGD*sizeof(pgd_t), - PTRS_PER_PGD*sizeof(pgd_t), - 0, - pgd_ctor, - PTRS_PER_PMD == 1 ? pgd_dtor : NULL); - if (!pgd_cache) - panic("pgtable_cache_init(): Cannot create pgd cache"); -} - /* * This function cannot be __init, since exceptions don't work in that * section. Put this after the callers, so that it cannot be inlined. Index: linux-2.6.21-rc3/arch/i386/mm/pgtable.c === --- linux-2.6.21-rc3.orig/arch/i386/mm/pgtable.c2007-03-10 13:13:32.0 -0800 +++ linux-2.6.21-rc3/arch/i386/mm/pgtable.c 2007-03-10 13:43:39.0 -0800 @@ -13,6 +13,7 @@ #include #include #include +#include #include #include @@ -181,9 +182,12 @@ void reserve_top_address(unsigned long r #endif } +#define QUICK_PGD 0 +#define QUICK_PT 1 + pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address) { - return (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO); + return (pte_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL, NULL); } struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address) @@ -198,11 +202,6 @@ struct page *pte_alloc_one(struct mm_str return pte; } -void pmd_ctor(void *pmd, struct kmem_cache *cache, unsigned long flags) -{ - memset(pmd, 0, PTRS_PER_PMD*sizeof(pmd_t)); -} - /* * List of all pgd's needed for non-PAE so it can invalidate entries * in both cached and uncached pgd's; not needed for PAE since the @@ -211,8 +210,6 @@ void pmd_ctor(void *pmd, struct kmem_cac * against pageattr.c; it is the unique case in which a valid change * of kernel pagetables can't be lazily synchronized by vmalloc faults. * vmalloc faults work because attached pagetables are never freed. - * The locking scheme was chosen on the basis of manfred's - * recommendations and having no core impact whatsoever. * -- wli */ DEFINE_SPINLOCK(pgd_lock); @@ -238,7 +235,7 @@ static inline void pgd_list_del(pgd_t *p set_page_private(next, (unsigned long)pprev); } -void pgd_ctor(void *pgd, struct kmem_cache *cache, unsigned long unused) +void pgd_ctor(void *pgd) { unsigned long flags; @@ -264,7 +261,7 @@ void pgd_ctor(void *pgd, struct kmem_cac } /* never called when PTRS_PER_PMD > 1 */ -void pgd_dtor(void *pgd, struct kmem_cache *cache, unsigned long unused) +void pgd_dtor(void *pgd) { unsigned long flags; /* can be called from interrupt context */ @@ -277,13 +274,13 @@ void pgd_dtor(void *pgd, struct kmem_cac pgd_t *pgd_alloc(struct mm_struct *mm) { int i; - pgd_t *pgd = kmem_cache_alloc(pgd_cache, GFP_KERNEL); + pgd_t *pgd = quicklist_alloc(QUICK_PGD, GFP_KERNEL, pgd_ctor); if (PTRS_PER_PMD == 1 || !pgd) return pgd; for (i = 0; i < USER_PTRS_PER_PGD; ++i) { - pmd_t *pmd =
[QUICKLIST 1/6] Extract quicklist implementation from IA64
Abstract quicklist from the IA64 implementation Extract the quicklist implementation for IA64, clean it up and generalize it to: 1. Allow multiple quicklists 2. Add support for constructors and destructors.. Quicklist allocation and frees occur inline. The support for constructors / destructors and multiple quicklists can therefore be optimized out of the final code for an arch. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc3/arch/ia64/mm/init.c === --- linux-2.6.21-rc3.orig/arch/ia64/mm/init.c 2007-03-10 11:34:00.0 -0800 +++ linux-2.6.21-rc3/arch/ia64/mm/init.c2007-03-10 11:50:46.0 -0800 @@ -39,9 +39,6 @@ DEFINE_PER_CPU(struct mmu_gather, mmu_gathers); -DEFINE_PER_CPU(unsigned long *, __pgtable_quicklist); -DEFINE_PER_CPU(long, __pgtable_quicklist_size); - extern void ia64_tlb_init (void); unsigned long MAX_DMA_ADDRESS = PAGE_OFFSET + 0x1UL; @@ -56,54 +53,6 @@ EXPORT_SYMBOL(vmem_map); struct page *zero_page_memmap_ptr; /* map entry for zero page */ EXPORT_SYMBOL(zero_page_memmap_ptr); -#define MIN_PGT_PAGES 25UL -#define MAX_PGT_FREES_PER_PASS 16L -#define PGT_FRACTION_OF_NODE_MEM 16 - -static inline long -max_pgt_pages(void) -{ - u64 node_free_pages, max_pgt_pages; - -#ifndefCONFIG_NUMA - node_free_pages = nr_free_pages(); -#else - node_free_pages = node_page_state(numa_node_id(), NR_FREE_PAGES); -#endif - max_pgt_pages = node_free_pages / PGT_FRACTION_OF_NODE_MEM; - max_pgt_pages = max(max_pgt_pages, MIN_PGT_PAGES); - return max_pgt_pages; -} - -static inline long -min_pages_to_free(void) -{ - long pages_to_free; - - pages_to_free = pgtable_quicklist_size - max_pgt_pages(); - pages_to_free = min(pages_to_free, MAX_PGT_FREES_PER_PASS); - return pages_to_free; -} - -void -check_pgt_cache(void) -{ - long pages_to_free; - - if (unlikely(pgtable_quicklist_size <= MIN_PGT_PAGES)) - return; - - preempt_disable(); - while (unlikely((pages_to_free = min_pages_to_free()) > 0)) { - while (pages_to_free--) { - free_page((unsigned long)pgtable_quicklist_alloc()); - } - preempt_enable(); - preempt_disable(); - } - preempt_enable(); -} - void lazy_mmu_prot_update (pte_t pte) { Index: linux-2.6.21-rc3/include/asm-ia64/pgalloc.h === --- linux-2.6.21-rc3.orig/include/asm-ia64/pgalloc.h2007-03-10 11:34:00.0 -0800 +++ linux-2.6.21-rc3/include/asm-ia64/pgalloc.h 2007-03-10 12:37:56.0 -0800 @@ -18,71 +18,18 @@ #include #include #include +#include #include -DECLARE_PER_CPU(unsigned long *, __pgtable_quicklist); -#define pgtable_quicklist __ia64_per_cpu_var(__pgtable_quicklist) -DECLARE_PER_CPU(long, __pgtable_quicklist_size); -#define pgtable_quicklist_size __ia64_per_cpu_var(__pgtable_quicklist_size) - -static inline long pgtable_quicklist_total_size(void) -{ - long ql_size = 0; - int cpuid; - - for_each_online_cpu(cpuid) { - ql_size += per_cpu(__pgtable_quicklist_size, cpuid); - } - return ql_size; -} - -static inline void *pgtable_quicklist_alloc(void) -{ - unsigned long *ret = NULL; - - preempt_disable(); - - ret = pgtable_quicklist; - if (likely(ret != NULL)) { - pgtable_quicklist = (unsigned long *)(*ret); - ret[0] = 0; - --pgtable_quicklist_size; - preempt_enable(); - } else { - preempt_enable(); - ret = (unsigned long *)__get_free_page(GFP_KERNEL | __GFP_ZERO); - } - - return ret; -} - -static inline void pgtable_quicklist_free(void *pgtable_entry) -{ -#ifdef CONFIG_NUMA - int nid = page_to_nid(virt_to_page(pgtable_entry)); - - if (unlikely(nid != numa_node_id())) { - free_page((unsigned long)pgtable_entry); - return; - } -#endif - - preempt_disable(); - *(unsigned long *)pgtable_entry = (unsigned long)pgtable_quicklist; - pgtable_quicklist = (unsigned long *)pgtable_entry; - ++pgtable_quicklist_size; - preempt_enable(); -} - static inline pgd_t *pgd_alloc(struct mm_struct *mm) { - return pgtable_quicklist_alloc(); + return quicklist_alloc(0, GFP_KERNEL, NULL); } static inline void pgd_free(pgd_t * pgd) { - pgtable_quicklist_free(pgd); + quicklist_free(0, NULL, pgd); } #ifdef CONFIG_PGTABLE_4 @@ -94,12 +41,12 @@ pgd_populate(struct mm_struct *mm, pgd_t static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr) { - return pgtable_quicklist_alloc(); + return quicklist_alloc(0, GFP_KERNEL, NULL); } static inline void pud_free(pud_t *
[QUICKLIST 4/6] x86_64: Single Quicklist
x86_64: Convert to use a single quicklists This adds caching of pgds and puds, pmds, pte. That way we can avoid costly zeroing and initialization of special mappings in the pgd. The first patch just adds a simple implementation using a single quicklist. As a consequence we need to zero a pgd before returning it to the pool. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc3/arch/x86_64/Kconfig === --- linux-2.6.21-rc3.orig/arch/x86_64/Kconfig 2007-03-10 10:45:38.0 -0800 +++ linux-2.6.21-rc3/arch/x86_64/Kconfig2007-03-10 12:50:47.0 -0800 @@ -56,6 +56,10 @@ config ZONE_DMA bool default y +config NR_QUICK + int + default 1 + config ISA bool Index: linux-2.6.21-rc3/include/asm-x86_64/pgalloc.h === --- linux-2.6.21-rc3.orig/include/asm-x86_64/pgalloc.h 2007-03-10 10:45:39.0 -0800 +++ linux-2.6.21-rc3/include/asm-x86_64/pgalloc.h 2007-03-10 12:52:14.0 -0800 @@ -5,6 +5,7 @@ #include #include #include +#include #define pmd_populate_kernel(mm, pmd, pte) \ set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte))) @@ -21,23 +22,23 @@ static inline void pmd_populate(struct m static inline void pmd_free(pmd_t *pmd) { BUG_ON((unsigned long)pmd & (PAGE_SIZE-1)); - free_page((unsigned long)pmd); + quicklist_free(0, NULL, pmd); } static inline pmd_t *pmd_alloc_one (struct mm_struct *mm, unsigned long addr) { - return (pmd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT); + return (pmd_t *)quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL); } static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr) { - return (pud_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT); + return (pud_t *)quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL); } static inline void pud_free (pud_t *pud) { BUG_ON((unsigned long)pud & (PAGE_SIZE-1)); - free_page((unsigned long)pud); + quicklist_free(0, NULL, pud); } static inline void pgd_list_add(pgd_t *pgd) @@ -69,9 +70,10 @@ static inline void pgd_list_del(pgd_t *p static inline pgd_t *pgd_alloc(struct mm_struct *mm) { unsigned boundary; - pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT); + pgd_t *pgd = (pgd_t *)quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL); if (!pgd) return NULL; + pgd_list_add(pgd); /* * Copy kernel pointers in from init. @@ -90,17 +92,18 @@ static inline void pgd_free(pgd_t *pgd) { BUG_ON((unsigned long)pgd & (PAGE_SIZE-1)); pgd_list_del(pgd); - free_page((unsigned long)pgd); + memset(pgd, 0, PAGE_SIZE); + quicklist_free(0, NULL, pgd); } static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address) { - return (pte_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT); + return (pte_t *)quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL); } static inline struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address) { - void *p = (void *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT); + void *p = (void *)quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL); if (!p) return NULL; return virt_to_page(p); @@ -112,17 +115,21 @@ static inline struct page *pte_alloc_one static inline void pte_free_kernel(pte_t *pte) { BUG_ON((unsigned long)pte & (PAGE_SIZE-1)); - free_page((unsigned long)pte); + quicklist_free(0, NULL, pte); } static inline void pte_free(struct page *pte) { __free_page(pte); -} +} #define __pte_free_tlb(tlb,pte) tlb_remove_page((tlb),(pte)) #define __pmd_free_tlb(tlb,x) tlb_remove_page((tlb),virt_to_page(x)) #define __pud_free_tlb(tlb,x) tlb_remove_page((tlb),virt_to_page(x)) +static inline void check_pgt_cache(void) +{ + quicklist_check(0, NULL); +} #endif /* _X86_64_PGALLOC_H */ Index: linux-2.6.21-rc3/mm/Kconfig === --- linux-2.6.21-rc3.orig/mm/Kconfig2007-03-10 11:50:46.0 -0800 +++ linux-2.6.21-rc3/mm/Kconfig 2007-03-10 12:50:47.0 -0800 @@ -168,3 +168,8 @@ config QUICKLIST default y if NR_QUICK != 0 +config QUICKLIST + bool + default y if NR_QUICK != 0 + + Index: linux-2.6.21-rc3/arch/x86_64/kernel/process.c === --- linux-2.6.21-rc3.orig/arch/x86_64/kernel/process.c 2007-03-10 10:45:38.0 -0800 +++ linux-2.6.21-rc3/arch/x86_64/kernel/process.c 2007-03-10 12:52:46.0 -0800 @@ -207,6 +207,7 @@ void cpu_idle (void) if (__get_cpu_var(cpu_idle_state)) __get_cpu_var(cpu_idle_state) = 0; +
[QUICKLIST 0/6] Arch independent quicklists V1
This patchset introduces an arch independent framework to handle lists of recently used page table pages. Page table pages have the characteristics that they are typically zero or in a known state when they are freed. This is usually the exactly same state as needed after allocation. So it makes sense to build a list of freed page table pages and then consume the pages already in use first. Those pages have already been initialized correctly (thus no need to zero them) and are likely already cached in such a way that the MMU can use them most effectively. Such an implementation already exits for ia64. If I remember correctly it was done by Robin Holt. Howver, that implementation did not support constructors and destructors as needed by i386 / x86_64. It also only supported a single quicklist. The implementation here has constructor and destructor support as well as the ability for an arch to specify how many quicklists are needed. Quicklists are defined by an arch defining the necessary number of quicklists in arch//Kconfig. F.e. i386 needs two and thus has config NR_QUICK int default 2 If an arch has requested quicklist support then pages can be allocated from the quicklist (or from the page allocator if the quicklist is empty) via: quicklist_alloc(, , ) Page table pages can be freed using: quicklist_free(, , ) Pages must have a definite state after allocation and before they are freed. If no constructor is specified then pages will be zeroed on allocation and must be zeroed before they are freed. If a constructor is used then the constructor will establish a definite page state. F.e. the i386 and x86_64 pgd constructors establish certain mappings. Constructors and destructors can also be used to track the pages. i386 and x86_64 use a list of pgds in order to be able to dynamically update standard mappings. 6 patches follow this message: [QUICKLIST 1/6] Extract quicklist implementation from IA64 [QUICKLIST 2/6] i386: quicklist support [QUICKLIST 3/6] i386: Use standard list manipulators for pgd_list [QUICKLIST 4/6] x86_64: Single quicklist [QUICKLIST 5/6] x86_64: Separate quicklist for pgds [QUICKLIST 6/6] slub: remove special casing for PAGE_SIZE slabs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...
On Sat, 10 Mar 2007, Nicholas Miell wrote: > If that's the goal, somebody should start thinking about reducing the > contents of struct file to the bare minimum (i.e. not much more than a > file_operations pointer). That's already pretty smal, and the single inode (and maybe dentry) will make it even smaller. Unless you want to create brazillions of signalfds, timerfds or asyncfds. > > And the real point of the whole signalfd() is that there really *are* a > > lot of UNIX interfaces that basically only work with file descriptors. Not > > just read, but select/poll/epoll. > > It'd be useful if the polling interfaces could return small datums > beyond just the POLL* flags -- having to do a read on timerfd just to > get the overrun count has a lot of overhead for just an integer, and I > imagine other things would like to pass back stuff too. ... > You still want timeouts, creating/setting/destroying at timer just for > a single call to select/poll/epoll is probably too heavy weight. Take a look at what timerfd does and what posix timers has to do to implement the interface. You'll prolly stop trolling with things like "a lot of overhead" or "too heavy weight". > timerfd() still leaves out the basic clock selection functionality > provided by both setitimer() and timer_create(). That is coming as soon as I fixed my send-serie script ... - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] proc: maps protection
On Sat, Mar 10, 2007 at 04:21:01PM -0800, Andrew Morton wrote: > We'd be needing a changelog for that. Done; sent separately from this email. > Please update the procfs documentation. Done. > Does the patch also cover /proc/pid/smaps? Yes, and numa_maps. Thanks! -- Kees Cook - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...
On Sat, 2007-03-10 at 16:35 -0800, Linus Torvalds wrote: > > On Sat, 10 Mar 2007, Nicholas Miell wrote: > > > > > > I'd actually much rather do POSIX timers the other way around: associate > > > a > > > generic notification mechanism with the file descriptor, and then > > > implement posix_timer_create() on top of timerfd. Now THAT sounds like a > > > clean unix-like interface ("everything is a file") and would imply that > > > you'd be able to do the same kind of notification for any file > > > descriptor, > > > not just timers. > > > > > > > But timers aren't files or even remotely file-like > > What do you think "a file" is? > > In UNIX, a file descriptor is pretty much anything. You could say that > sockets aren't remotely file-like, and you'd be right. What's your point? > If you can read on it, it's a file. Ah, I see. You're just interested in fds as a generic handle concept, and not a more Plan 9 type thing. If that's the goal, somebody should start thinking about reducing the contents of struct file to the bare minimum (i.e. not much more than a file_operations pointer). > > And the real point of the whole signalfd() is that there really *are* a > lot of UNIX interfaces that basically only work with file descriptors. Not > just read, but select/poll/epoll. It'd be useful if the polling interfaces could return small datums beyond just the POLL* flags -- having to do a read on timerfd just to get the overrun count has a lot of overhead for just an integer, and I imagine other things would like to pass back stuff too. > They currently have just one timeout, but the thing is, if UNIX had just > had "timer file descriptors", they'd not need even that one. And even with > the timeout, Davide's patch actually makes for a *better* timeout than the > ones provided by select/poll/epoll, exactly because you can do things like > repeating timers and absolute time etc. > > Much more naturally than the timer interface we currently have for those > system calls. > You still want timeouts, creating/setting/destroying at timer just for a single call to select/poll/epoll is probably too heavy weight. timerfd() still leaves out the basic clock selection functionality provided by both setitimer() and timer_create(). > The same goes for signals. The whole "pselect()" thing shows that signals > really *should* have been file descriptors, and suddenly you don't need > "pselect()" at all. > > So the "not remotely file-like" is not actually a real argument. One of > the big *points* of UNIX was that it unified a lot under the general > umbrella of a "file descriptor". Davide just unifies even more. > > Linus -- Nicholas Miell <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RSDL-mm 0.28
I've tested -mm2 against -mm2+noyield and -mm2+rsdl+noyield. The noyield patch simply makes the sched_yield syscall return immediately. Xorg and all tests are run at nice 0. Loads: memload: constant memcpy of 16MB buffer execload: constant re-exec of a trivial shell script forkload: constant fork and exit of a trivial shell script make -j 5: hot-cache kernel build without ccache make -j 5 ccache: hot-cache kernel build with ccache Tests: beryl - 3D window manager, wiggle windows, spin desktop, etc. galeon - web browser, rapidly scrolling long web pages by grabbing the scroll bar mp3 - XMMS on a FUSE sshfs over wireless (during all tests) terminal - responsiveness of ssh and local terminal sessions mouse - responsiveness of mouse pointer Results: great = completely smooth good = fully responsive ok = visible latency bad = becomes difficult to use (or mp3 skips) awful = make it stop, please -mm2-mm2+noyield rsdl+noyield no load berylgreat great great galeon goodgood good mp3 goodgood good terminal goodgood good mousegoodgood good memload x10 berylawful/bad great good galeon goodgood ok/good mp3 goodgood good terminal goodgood good mousegoodgood good execload x10 berylawful/bad bad/good good galeon goodbad/good ok/good mp3 goodbadgood terminal goodbad/good good mousegoodbad/good good forkload x10 berylgoodgood great galeon goodgood ok/good mp3 goodgood good terminal goodgood ok/good mousegoodgood good make -j 5 berylok good good/great galeon goodgood ok/good mp3 goodgood good terminal goodgood good mousegoodgood good make -j 5 ccache berylok good awful galeon goodgood bad mp3 goodgood bad terminal goodgood bad/ok mousegoodgood bad/ok make -j 5 real 8m1.857s8m50.659s 8m9.282s user 7m19.127s 8m3.494s 7m30.740s sys 0m30.910s 0m33.722s 0m29.542s make -j 5 ccache real 2m6.182s2m19.032s 2m1.832s user 1m39.466s 1m48.787s 1m37.250s sys 0m19.741s 0m22.993s 0m20.109s There's a substantial performance hit for not yield, so we probably want to investigate alternate semantics for it. It seems reasonable for apps to say "let me not hog the CPU" without completely expiring them. Imagine you're in the front of the line (aka queue) and you spend a moment fumbling for your wallet. The polite thing to do is to let the next guy in front. But with the current sched_yield, you go all the way to the back of the line. RSDL makes most of the noyield hit back in normal make and then some with ccache. Impressive. But ccache is still destroying interactivity somehow. The ccache effect is fairly visible even with non-parallel 'make'. Also note I could occassionally trigger nasty multi-second pauses with -mm2+noyield under exectest that didn't show up elsewhere. That's probably a bug in the mainline scheduler. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] proc: maps protection
The /proc/pid/ "maps", "smaps", and "numa_maps" files contain sensitive information about the memory location and usage of processes. Issues: - maps should not be world-readable, especially if programs expect any kind of ASLR protection from local attackers. - maps cannot just be 0400 because "-D_FORTIFY_SOURCE=2 -O2" makes glibc check the maps when %n is in a *printf call, and a setuid(getuid()) process wouldn't be able to read its own maps file. (For reference see http://lkml.org/lkml/2006/1/22/150) - a system-wide toggle is needed to allow prior behavior in the case of non-root applications that depend on access to the maps contents. This change implements a check using "ptrace_may_attach" before allowing access to read the maps contents. To control this protection, the new knob /proc/sys/kernel/maps_protect has been added, with corresponding updates to the procfs documentation. Signed-off-by: Kees Cook <[EMAIL PROTECTED]> --- CREDITS|2 +- Documentation/filesystems/proc.txt |7 +++ fs/proc/base.c |3 +++ fs/proc/internal.h |2 ++ fs/proc/task_mmu.c | 16 +++- fs/proc/task_nommu.c |6 ++ include/linux/sysctl.h |1 + kernel/sysctl.c|9 + 8 files changed, 44 insertions(+), 2 deletions(-) --- diff --git a/CREDITS b/CREDITS index 6bd8ab8..38c3ada 100644 --- a/CREDITS +++ b/CREDITS @@ -655,7 +655,7 @@ N: Kees Cook E: [EMAIL PROTECTED] W: http://outflux.net/ P: 1024D/17063E6D 9FA3 C49C 23C9 D1BC 2E30 1975 1FFF 4BA9 1706 3E6D -D: Minor updates to SCSI code for the Communications type +D: Minor updates to SCSI types, added /proc/pid/maps protection S: (ask for current address) S: USA diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 5484ab5..d9b06b5 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -1137,6 +1137,13 @@ determine whether or not they are still functioning properly. Because the NMI watchdog shares registers with oprofile, by disabling the NMI watchdog, oprofile may have more registers to utilize. +maps_protect + + +Enables/Disables the protection of the per-process proc entries "maps" and +"smaps". When enabled, the contents of these files are visible only to +readers that are allowed to ptrace() the given process. + 2.4 /proc/sys/vm - The virtual memory subsystem --- diff --git a/fs/proc/base.c b/fs/proc/base.c index 01f7769..6feccbc 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -123,6 +123,9 @@ struct pid_entry { NULL, _info_file_operations, \ { .proc_read = _##OTYPE } ) +int maps_protect = 0; +EXPORT_SYMBOL(maps_protect); + static struct fs_struct *get_fs_struct(struct task_struct *task) { struct fs_struct *fs; diff --git a/fs/proc/internal.h b/fs/proc/internal.h index c932aa6..2c65b6e 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -33,6 +33,8 @@ do { \ extern int nommu_vma_show(struct seq_file *, struct vm_area_struct *); #endif +extern int maps_protect; + extern void create_seq_entry(char *name, mode_t mode, const struct file_operations *f); extern int proc_exe_link(struct inode *, struct dentry **, struct vfsmount **); extern int proc_tid_stat(struct task_struct *, char *); diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 7445980..45a0f3e 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -134,6 +134,9 @@ static int show_map_internal(struct seq_file *m, void *v, struct mem_size_stats dev_t dev = 0; int len; + if (maps_protect && !ptrace_may_attach(task)) + return -EACCES; + if (file) { struct inode *inode = vma->vm_file->f_path.dentry->d_inode; dev = inode->i_sb->s_dev; @@ -444,11 +447,22 @@ const struct file_operations proc_maps_operations = { #ifdef CONFIG_NUMA extern int show_numa_map(struct seq_file *m, void *v); +static int show_numa_map_checked(struct seq_file *m, void *v) +{ + struct proc_maps_private *priv = m->private; + struct task_struct *task = priv->task; + + if (maps_protect && !ptrace_may_attach(task)) + return -EACCES; + + return show_numa_map(m, v); +} + static struct seq_operations proc_pid_numa_maps_op = { .start = m_start, .next = m_next, .stop = m_stop, -.show = show_numa_map +.show = show_numa_map_checked }; static int numa_maps_open(struct inode *inode, struct file *file) diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c index 7cddf6b..c2747c9 100644 --- a/fs/proc/task_nommu.c +++ b/fs/proc/task_nommu.c @@ -143,6 +143,12 @@ out: static int show_map(struct seq_file *m, void *_vml) {
Re: [PATCH] proc: maps protection
On Sat, Mar 10, 2007 at 04:21:01PM -0800, Andrew Morton wrote: > > On Sat, 10 Mar 2007 10:33:41 -0800 Kees Cook <[EMAIL PROTECTED]> wrote: > > Here's another revision, with both the "can ptrace" and the global /proc > > knob; > > We'd be needing a changelog for that. > > Please update the procfs documentation. > > Does the patch also cover /proc/pid/smaps? Also, we ought to revisit /proc/pid/mem write, which is currently disabled. Either drop the code, fix it, or make it root only. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: RSDL v0.28 for 2.6.20
On 3/11/07, Thibaut VARENE <[EMAIL PROTECTED]> wrote: On 3/11/07, Con Kolivas <[EMAIL PROTECTED]> wrote: > Has anyone had any trouble with RSDL on the stable kernels (ie not -mm)? Tested fine so far on ppc, ia64 and (mostly) parisc. I meant ppc64, actually. Gomen. -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: RSDL v0.28 for 2.6.20
On 3/11/07, Con Kolivas <[EMAIL PROTECTED]> wrote: Has anyone had any trouble with RSDL on the stable kernels (ie not -mm)? Tested fine so far on ppc, ia64 and (mostly) parisc. HTH -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] SCSI: Delete unused header file.
On Sat, 2007-03-10 at 17:16 -0500, Robert P. J. Day wrote: > Delete apparently unused header file drivers/scsi/pci2000.h. This was apparently missed by Christoph when he removed the driver ... I'll add it to the queue. For future SCSI work, could you cc linux-scsi@vger.kernel.org please? That way, any interested parties are more likely to see the patch. Thanks, James - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Use of absolute timeouts for oneshot timers
Thomas Gleixner wrote: > It's simply enforced in NO_HZ, HIGHRES mode as we operate in absolute > time, which is read back from the clocksource, even if we use a relative > value for real hardware clock event devices to program the next event. > We calculate the delta between the absolute event and now. So we never > get an accumulating error. > > What problem are you observing ? Actually, two things. There was the unexpected pauses during boot, which is trivially fixable by not using the Xen periodic timer, and using the single-shot fallback. But I'm making the more general observation that if you use an absolute rather than relative time to set the single-shot timeout, then you have to deal with a long-term cumulative drift between the kernel's monotonic time and the hypervisor's monotonic time. This can happen even if your clocksource is derived directly from the hypervisor monotonic time, because running ntp will warp the kernel's time, and so it will drift with respect to the hypervisor clock. You can only avoid this by 1) not allowing adjtime, or 2) making those same adjtime warps to the hypervisor time. Neither of these is a good general solution. Therefore, the only useful way to set a single-shot timer is by using relative rather than absolute time, and making sure the delta not too large. The guest and hypervisor may (and in general, will) have drifting clocks, but the error will never be too large to deal with. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...
On Sat, 10 Mar 2007, Nicholas Miell wrote: > > > > I'd actually much rather do POSIX timers the other way around: associate a > > generic notification mechanism with the file descriptor, and then > > implement posix_timer_create() on top of timerfd. Now THAT sounds like a > > clean unix-like interface ("everything is a file") and would imply that > > you'd be able to do the same kind of notification for any file descriptor, > > not just timers. > > > > But timers aren't files or even remotely file-like What do you think "a file" is? In UNIX, a file descriptor is pretty much anything. You could say that sockets aren't remotely file-like, and you'd be right. What's your point? If you can read on it, it's a file. And the real point of the whole signalfd() is that there really *are* a lot of UNIX interfaces that basically only work with file descriptors. Not just read, but select/poll/epoll. They currently have just one timeout, but the thing is, if UNIX had just had "timer file descriptors", they'd not need even that one. And even with the timeout, Davide's patch actually makes for a *better* timeout than the ones provided by select/poll/epoll, exactly because you can do things like repeating timers and absolute time etc. Much more naturally than the timer interface we currently have for those system calls. The same goes for signals. The whole "pselect()" thing shows that signals really *should* have been file descriptors, and suddenly you don't need "pselect()" at all. So the "not remotely file-like" is not actually a real argument. One of the big *points* of UNIX was that it unified a lot under the general umbrella of a "file descriptor". Davide just unifies even more. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...
On Sat, 2007-03-10 at 14:42 -0800, Linus Torvalds wrote: > > On Sat, 10 Mar 2007, Nicholas Miell wrote: > > > > Care to elaborate on why they're a horrible crock? > > It's a *classic* case of an interface that tries to do everything under > the sun. > > Here's a clue: look at any system call that takes a union as part of its > arguments. Count them. I think we have two: > - struct siginfo No argument here -- just about everything related to signals is stupidly complex. > - struct sigevent However, this I take issue with. Conceptually (and what the user ends up actually using), struct sigevent is just: struct sigevent { int sigev_notify;/* delivery method */ sigval_t sigev_value /* user cookie */ int sigev_signo; /* signal number */ void (*sigev_notify_function)(sigval_t); /* thread fn */ pthread_attr_t *sigev_notify_attributes; /* thread attr */ }; You could complain about sigval_t being a union, but that's probably just because it predates uintptr_t. (Plus, no ugly casting.) You also could complain that the above isn't what you actually see when you look at /usr/include/bits/siginfo.h -- there's a union involved and some macros to hide the fact, but that's just internal implementation details related to how threads are created and padding out the struct for any future expansion. The actual complexity for understanding and using struct sigevent isn't all that much, and once you've figured that out, you know how to configure event delivery for AIO completion, DNS resolution, and messages queues, not just timers. > and they are both broken horrible interfaces where the data structures > depend on various flags. > > It's just not the UNIX system call way. And none of it really makes sense > if you already have a file descriptor, since at that point you know what > the notification mechanism is. > > I'd actually much rather do POSIX timers the other way around: associate a > generic notification mechanism with the file descriptor, and then > implement posix_timer_create() on top of timerfd. Now THAT sounds like a > clean unix-like interface ("everything is a file") and would imply that > you'd be able to do the same kind of notification for any file descriptor, > not just timers. > But timers aren't files or even remotely file-like -- if they were a real files, you could just open /dev/timers/realtime/2007/June/3rd/half-past-teatime and get a timer. (Or, more realisticly, open /dev/timer and use ioctl().) timerfd() had to be created to coerce them into some semblance of filehood just to make them work with existing (and new) polling/queuing interfaces just because those interfaces can only deal with file descriptors. Making non-file things look like files just because that's what poll() and friends can deal with isn't much different from holding a hammer in your hand and looking for what you have to do in order to turn every problem into a nail. Sometimes you need to go back to your toolbox for a screwdriver or a saw. > But posix timers as they are done now are just an abomination. They are > not unix-like at all. > > > And are the bugs fixed? If so, why replace them? They work now. > > .. but the reason for the bugs was largely a very baroque interface, which > didn't get fixed (because it's specified by the standard). > But the API isn't baroque. There's a veritable boutique of clock sources to choose from, but they all serve specific needs, it's just one parameter to timer_create, and you probably want CLOCK_MONOTONIC anyway. struct sigevent might be a bit complex, but the difficultly in learning that is amortized across all the other APIs that also use it to specify how their events are delivered. Delivering via signals and dealing with struct siginfo is painful, but everything related to signals is painful. This is what you get when you take an interface designed essentially for exception handling and start abusing it for general information delivery. But, hey!, that's what SIGEV_THREAD and SIGEV_PORT are for.[1] About the worst that can be said of it is that using timer_settime to both arm and disarm the timer and set the interval is awkward. [1] A SIGEV_FUNCTION which skips all the signal baggage and just passes a supplied cookie and a purpose-specific struct pointer to an object-specific user-supplied function pointer might be interesting, but then you run into all of the reentrancy/masking/choosing which thread to deliver to and other issues that signals already have without the benefit of the existing signal infrastructure for all that stuff. Gah, I don't want to think about this anymore. -- Nicholas Miell <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
Re: [PATCH] proc: maps protection
> On Sat, 10 Mar 2007 10:33:41 -0800 Kees Cook <[EMAIL PROTECTED]> wrote: > Here's another revision, with both the "can ptrace" and the global /proc > knob; We'd be needing a changelog for that. Please update the procfs documentation. Does the patch also cover /proc/pid/smaps? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] MMC: Clean up low voltage range handling
Clean up the handling of low voltage MMC cards. The latest MMC and SD specs both agree that the low voltage range is defined as 1.65-1.95V and is signified by bit 7 in the OCR. An old Sandisk spec implied that bits 7-0 represented voltages below 2.0V in 1V increments, and the code was accordingly written with that expectation. This confusion meant that host drivers attempting to support the typical low voltage (1.8V) would set the wrong bits in the host OCR mask (usually bits 5 and 6) resulting in the the low voltage mode never being used. This change switches the code to conform to the specs and fixes the SDHCI driver. It also removes the explicit defines for the host vdd and updates the SDHCI driver to convert the bit number back to the mask value for comparisons. Having only a single set of defines ensures there's nothing to get out of sync. Signed-off-by: Philip Langdale <[EMAIL PROTECTED]> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index c87ce56..74ebd97 100644 --- a/drivers/mmc/core/core.c +++ b/drivers/mmc/core/core.c @@ -317,6 +317,24 @@ static u32 mmc_select_voltage(struct mmc { int bit; + /* +* Sanity check the voltages that the card claims to +* support. +*/ + if (ocr & 0x7F) { + printk("%s: card claims to support voltages below " + "the defined range. These will be ignored.\n", + mmc_hostname(host)); + ocr &= ~0x7F; + } + + if (host->mode == MMC_MODE_SD && (ocr & MMC_VDD_165_195)) { + printk("%s: SD card claims to support the incompletely " + "defined 'low voltage range'. This will be ignored.\n", + mmc_hostname(host)); + ocr &= ~MMC_VDD_165_195; + } + ocr &= host->ocr_avail; bit = ffs(ocr); diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index 86d0957..a80c043 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -668,20 +668,16 @@ static void sdhci_set_power(struct sdhci pwr = SDHCI_POWER_ON; - switch (power) { - case MMC_VDD_170: - case MMC_VDD_180: - case MMC_VDD_190: + switch (1 << power) { + case MMC_VDD_165_195: pwr |= SDHCI_POWER_180; break; - case MMC_VDD_290: - case MMC_VDD_300: - case MMC_VDD_310: + case MMC_VDD_29_30: + case MMC_VDD_30_31: pwr |= SDHCI_POWER_300; break; - case MMC_VDD_320: - case MMC_VDD_330: - case MMC_VDD_340: + case MMC_VDD_32_33: + case MMC_VDD_33_34: pwr |= SDHCI_POWER_330; break; default: @@ -1293,7 +1289,7 @@ static int __devinit sdhci_probe_slot(st if (caps & SDHCI_CAN_VDD_300) mmc->ocr_avail |= MMC_VDD_29_30|MMC_VDD_30_31; if (caps & SDHCI_CAN_VDD_180) - mmc->ocr_avail |= MMC_VDD_17_18|MMC_VDD_18_19; + mmc->ocr_avail |= MMC_VDD_165_195; if (mmc->ocr_avail == 0) { printk(KERN_ERR "%s: Hardware doesn't report any " diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h index 43bf6a5..89dbb91 100644 --- a/include/linux/mmc/host.h +++ b/include/linux/mmc/host.h @@ -16,30 +16,7 @@ struct mmc_ios { unsigned intclock; /* clock rate */ unsigned short vdd; -#defineMMC_VDD_150 0 -#defineMMC_VDD_155 1 -#defineMMC_VDD_160 2 -#defineMMC_VDD_165 3 -#defineMMC_VDD_170 4 -#defineMMC_VDD_180 5 -#defineMMC_VDD_190 6 -#defineMMC_VDD_200 7 -#defineMMC_VDD_210 8 -#defineMMC_VDD_220 9 -#defineMMC_VDD_230 10 -#defineMMC_VDD_240 11 -#defineMMC_VDD_250 12 -#defineMMC_VDD_260 13 -#defineMMC_VDD_270 14 -#defineMMC_VDD_280 15 -#defineMMC_VDD_290 16 -#defineMMC_VDD_300 17 -#defineMMC_VDD_310 18 -#defineMMC_VDD_320 19 -#defineMMC_VDD_330 20 -#defineMMC_VDD_340 21 -#defineMMC_VDD_350 22 -#defineMMC_VDD_360 23 +/* vdd stores the bit number of the selected voltage range from protocol.h */ unsigned char bus_mode; /* command output mode */ @@ -88,14 +65,7 @@ struct mmc_host { unsigned intf_max; u32 ocr_avail; -#define MMC_VDD_145_1500x0001 /* VDD voltage 1.45 - 1.50 */ -#define MMC_VDD_150_1550x0002 /* VDD voltage 1.50 - 1.55 */ -#define MMC_VDD_155_1600x0004 /* VDD voltage 1.55 - 1.60 */ -#define MMC_VDD_160_1650x0008 /* VDD voltage 1.60 - 1.65 */ -#define MMC_VDD_165_1700x0010 /* VDD voltage
Re: IP Defragmentation
On Mar 8 2007 11:45, Kanhu Rauta wrote: > > 1>in case of fragmention i am getting only one packet at the > hook,While analyzing the ip header it says this is the assembled > packet(skb->len=1528,offset=0,MF=0). conntrack assembles defragmented packets. > While dumping the data(for 0 to 1528 print skb->data[i]) it shows that > only 1472 bytes are valid data and rest 28 bytes are something > garbage. Have you forgotten to use skb_header_pointer()? Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
sched rsdl fix for 0.28
Here's a big bugfix for sched rsdl 0.28 --- kernel/sched.c |7 +++ 1 file changed, 7 insertions(+) Index: linux-2.6.21-rc3-mm2/kernel/sched.c === --- linux-2.6.21-rc3-mm2.orig/kernel/sched.c2007-03-11 11:04:38.0 +1100 +++ linux-2.6.21-rc3-mm2/kernel/sched.c 2007-03-11 11:05:46.0 +1100 @@ -3328,6 +3328,13 @@ static inline void rotate_runqueue_prior int new_prio_level, remaining_quota = rq_quota(rq, rq->prio_level); struct prio_array *array = rq->active; + /* +* Make sure we don't have tasks still on the active array that +* haven't run due to not preempting (merging or smp balancing) +*/ + if (find_next_bit(rq->dyn_bitmap, MAX_PRIO, MAX_RT_PRIO) < + rq->prio_level) + return; if (rq->prio_level > MAX_PRIO - 2) { /* Major rotation required */ struct prio_array *new_queue = rq->expired; -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Use of absolute timeouts for oneshot timers
Thomas Gleixner wrote: > The clocksource is not used until the clocksource is installed. Also the > periodic mode during boot, when the clock event device supports periodic > mode, is not reading the time. It relies on the clock event device > getting it straight. Yes. This could be one source of error, where I compute the offset hypervisor_time - ktime_get(), but ktime_get() may drift with respect to hypervisor time while using a periodic jiffies timebase. > Once we switch to NO_HZ or HIGHRES the clock event device is directly > coupled to the clock event source. > OK. Erm, but not in the sense that you always choose the xen/hpet/lapic clocksource+clockevent together; there's no direct linkage between the two kinds of device. But there's the coupling where the clocksource is always used to directly measure the clockevent's behaviour. > Once we switched over to the clocksource, everything should be in > perfect sync. > Assuming that the clocksource and the clockevent device have close-enough timebases. >> Or perhaps this is a property of the whole clock subsystem: that >> clockevents must be paired with clocksources. But its not obvious to me >> that this enforced, or even acknowledged. >> > > It's simply enforced in NO_HZ, HIGHRES mode as we operate in absolute > time, which is read back from the clocksource, even if we use a relative > value for real hardware clock event devices to program the next event. > We calculate the delta between the absolute event and now. So we never > get an accumulating error. > Right, but if the clocksource and the clockevent devices have a relative drift, then using the clocksource to compute that we need a 500ns delay, but the clockevent device ends delivering the oneshot event 750ns (or 250ns) later, then things are going to be locally upset, even if the next time the clockevent oneshot is programmed it will take the overshoot into account. (Of course, you'd hope the drift would never really be that bad, and 2^32 ns only gives you ~4s window to screw up). > What problem are you observing ? > Unexpected pauses during boot. I think the real problem is that Xen periodic timer events are not delivered unless the vcpu is actually running (ie, they're specifically intended for timeslicing rather than general periodic events). Perhaps the real fix in this case is to just remove the periodic feature flag. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL v0.28 for 2.6.20
On Sunday 11 March 2007 06:11, Willy Tarreau wrote: > On Sat, Mar 10, 2007 at 01:09:35PM -0500, Stephen Clark wrote: > > Con Kolivas wrote: > > >Here is an update for RSDL to version 0.28 > > > > > >Full patch: > > >http://ck.kolivas.org/patches/staircase-deadline/2.6.20-sched-rsdl-0.28. > > >patch > > > > > >Series: > > >http://ck.kolivas.org/patches/staircase-deadline/2.6.20/ > > > > > >The patch to get you from 0.26 to 0.28: > > >http://ck.kolivas.org/patches/staircase-deadline/2.6.20/sched-rsdl-0.26- > > >0.28.patch > > > > > >A similar patch and directories will be made for 2.6.21-rc3 without > > >further announcement > > > > doesn't apply against 2.6.20.2: > > > > patch -p1 <~/2.6.20-sched-rsdl-0.28.patch --dry-run > > patching file include/linux/list.h > > patching file fs/proc/array.c > > patching file fs/pipe.c > > patching file include/linux/sched.h > > patching file include/asm-generic/bitops/sched.h > > patching file include/asm-s390/bitops.h > > patching file kernel/sched.c > > Hunk #41 FAILED at 3531. > > 1 out of 62 hunks FAILED -- saving rejects to file kernel/sched.c.rej > > patching file include/linux/init_task.h > > patching file Documentation/sched-design.txt > > It is easier to apply 2.6.20.2 on top of 2.6.20+RSDL. The .2 patch > is a one-liner that you can easily fix by hand, and I'm not even > certain that it is still required : > > --- ./kernel/sched.c.orig 2007-03-10 13:03:51 +0100 > +++ ./kernel/sched.c 2007-03-10 13:08:02 +0100 > @@ -3544,7 +3544,7 @@ > next = list_entry(queue->next, struct task_struct, run_list); > } > > - if (dependent_sleeper(cpu, rq, next)) > + if (rq->nr_running == 1 && dependent_sleeper(cpu, rq, next)) > next = rq->idle; > switch_tasks: > if (next == rq->idle) > > BTW, Con, I think that you should base your work on 2.6.20.[23] and not > 2.6.20 next time, due to this conflict. It will get wider adoption. Gotcha. This bugfix for 2.6.20.2 was controversial anyway so it probably wont hurt if you dont apply it. Has anyone had any trouble with RSDL on the stable kernels (ie not -mm)? -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: "Make nenuconfig" does not save parameters.
On Mar 10 2007 23:45, Sam Ravnborg wrote: >> >On Sat, Mar 10, 2007 at 07:23:41PM +0100, Jan Engelhardt wrote: >> >> >> >> Whether the 'working config file path' should change when you do >> >> 'Save as Alternate' or not, is a menuconfig axiom. Ask Sam Ravnborg >> >> if you want it changed :-) >> > >> >Current behaviour is not logical but on the other hand I do not >> >see a big need to make it so. >> >It seems that people very seldom uses "save alternate" anyway. >> > >> >But patches are welcome. >> >> ^_^ The patch has already been posted, has not it? > >No. http://lkml.org/lkml/2007/3/10/163 ? Not that I have tried it personally. >Either we keep current behaviour or we change to the "normal" >behaviour with a "Save as..." as know from all other programs. Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Sunday 11 March 2007 10:34, Con Kolivas wrote: > On Sunday 11 March 2007 05:21, Mark Lord wrote: > > Con Kolivas wrote: > > > On Saturday 10 March 2007 05:07, Mark Lord wrote: > > >> Mmm.. when it's good, it's *really* good. > > >> My desktop feels snappier and all of that. > > > > > >.. > > > > > >> But when it's bad, it stinks. > > >> Like when a "make -j2" kernel rebuild is happening in a background > > >> window > > > > > > And that's bad. When you say "it stinks" is it more than 3 times > > > slower? It should be precisely 3 times slower under that load (although > > > low cpu using things like audio wont be affected by running 3 times > > > slower). If it feels like much more than that much slower, there is a > > > bug there somewhere. > > > > Scrolling windows is incredibly jerkey, and very very sluggish > > when images are involved (eg. a large web page in firefox). > > > > > As another reader suggested, how does it run with the compile 'niced'? > > > How does it perform with make (without a -j number). > > > > Yes, it behaves itself when the "make -j2" is nice'd. > > > > >> This is on a Pentium-M 760 single-core, w/2GB SDRAM (notebook). > > > > > > What HZ are you running? Are you running a Beryl desktop? > > > > HZ==1000, NO_HZ, Kubunutu Dapper Drake distro, ATI X300 open-source X.org > > driver. > > Can you try the new version of RSDL. Assuming it doesn't oops on you it has > some accounting bugfixes which may have been biting you. Oh I just checked the mesa repo for that driver as well. It seems the r300 drivers have sched_yield in them as well, but not all components. You may be getting bitten by this too. http://webcvs.freedesktop.org/mesa/Mesa/src/mesa/drivers/dri/r300/radeon_ioctl.c?revision=1.14=markup I don't really know what the radeon and other models are so I'm not sure if it applies to your hardware; I just did a random search through the r300 directory. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Use of absolute timeouts for oneshot timers
On Sat, 2007-03-10 at 14:52 -0800, Jeremy Fitzhardinge wrote: > When booting under Xen, you'll get this if you're using both the xen > clocksource and clockevent drivers. However, it seems that during boot > on a NO_HZ HIGHRES_TIMERS system, the kernel does not use the Xen > clocksource until it switches to highres timer mode. This means that > during boot the kernel's monotonic clock is drifting with respect to the > hypervisor, and all timeouts are unreliable. The clocksource is not used until the clocksource is installed. Also the periodic mode during boot, when the clock event device supports periodic mode, is not reading the time. It relies on the clock event device getting it straight. That's not a big deal during boot and on a kernel with NO_HZ=n and HIGHRES=n the periodic tick only updates jiffies. If the only clocksource is jiffies, then we have to live with it and we do not switch to NO_HZ/HIGHRES as we would lose track of time. Once we switch to NO_HZ or HIGHRES the clock event device is directly coupled to the clock event source. > Initially I was just computing the kernel-hypervisor offset at boot > time, but then I changed it to recompute it every time the timer mode > changes. However, this didn't really help, and I was still getting > unpredictable timeouts during boot. I've changed it to just compute the > hypervisor absolute time directly using the delta each time the oneshot > timer is set, which will definitely be reliable (if the kernel and > hypervisor have drifting timebases then the meaning of Xns delta will be > different, but at least thats a local error rather than a long-term > cumulative error). We do not really care up to the point, where the high resolution clocksource (e.g. TSC, PM-Timer or HPET on real hardware) becomes active. Early boot is fragile and we switch over to high res clocksource and highres/nohz when things have stabilized. > My analysis might be wrong here (I suspect the Xen periodic timer may > have unexpected behaviour), but the overall conclusion still stands: > using an absolute timeout only works if the kernel and hypervisor have > non-drifting timebases. I think its too fragile for a clockevent > implementation to assume that a particular clocksource is in use to get > reliable results. Once we switched over to the clocksource, everything should be in perfect sync. > Or perhaps this is a property of the whole clock subsystem: that > clockevents must be paired with clocksources. But its not obvious to me > that this enforced, or even acknowledged. It's simply enforced in NO_HZ, HIGHRES mode as we operate in absolute time, which is read back from the clocksource, even if we use a relative value for real hardware clock event devices to program the next event. We calculate the delta between the absolute event and now. So we never get an accumulating error. What problem are you observing ? tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Sunday 11 March 2007 05:21, Mark Lord wrote: > Con Kolivas wrote: > > On Saturday 10 March 2007 05:07, Mark Lord wrote: > >> Mmm.. when it's good, it's *really* good. > >> My desktop feels snappier and all of that. > > > >.. > > > >> But when it's bad, it stinks. > >> Like when a "make -j2" kernel rebuild is happening in a background > >> window > > > > And that's bad. When you say "it stinks" is it more than 3 times slower? > > It should be precisely 3 times slower under that load (although low cpu > > using things like audio wont be affected by running 3 times slower). If > > it feels like much more than that much slower, there is a bug there > > somewhere. > > Scrolling windows is incredibly jerkey, and very very sluggish > when images are involved (eg. a large web page in firefox). > > > As another reader suggested, how does it run with the compile 'niced'? > > How does it perform with make (without a -j number). > > Yes, it behaves itself when the "make -j2" is nice'd. > > >> This is on a Pentium-M 760 single-core, w/2GB SDRAM (notebook). > > > > What HZ are you running? Are you running a Beryl desktop? > > HZ==1000, NO_HZ, Kubunutu Dapper Drake distro, ATI X300 open-source X.org > driver. Can you try the new version of RSDL. Assuming it doesn't oops on you it has some accounting bugfixes which may have been biting you. Thanks -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] Configuration generic drivers at runtime
Hi everybody, I'm writing a Linux driver for USB Video Class (UVC) devices. Before submitting it to the kernel, there are still a few rough corners I'd like to polish. Comments would be appreciated for the following one. The UVC spec defines a way for device vendors to provide extensions to the standard through so-called extension units, identified by a GUID (Globally Unique IDentifier). An extension unit can define any number of controls (think of controls as simple parameters such as brightness, zoom, pan/tilt, shutter speed, ...). Devices advertise in their USB descriptors the extension units they support, along with the controls that are supported in each extension unit. To access those extension units from user-space, the UVC driver will offer two methods. One of them will map the controls defined by extension units to V4L2 controls. The question that arises is how to define and store those mappings. And obvious solution would be to have an ever growing array in the driver, storing control information for all possible extension units ever defined by webcam vendors. While this is quite straightforward, it might not be the most usable solution for device vendors who wouldn't want debug controls to be included in the kernel by default, or who wouldn't want to submit new control definitions for inclusion in the kernel (with the implied delay) every time a new device comes out. Another solution would be to introduce a way to define controls and mappings at runtime. Mappings would be stored in test-based user-space configuration files, distributed by vendors. A small user-space utility would add them through a few ioctls. This obviously raises some security concerns (regarding which users will be allowed to add mappings, or how many of them they can add). I would like comments regarding the second solution. Is this something that is likely to be accepted in the mainline kernel ? I don't know of any other Linux driver implementing such kind of dynamic runtime configuration. Best regards, Laurent Pinchart - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Use of absolute timeouts for oneshot timers
I've been thinking a bit more about how useful an absolute timeout is for a oneshot timer in a virtual environment. In principle, absolute times are generally preferable. A relative timeout means "timeout in X ns from now", but the meaning of "now" is ambiguous, particularly if the vcpu can be preempted at any time, which means the determination of "now" can be arbitrarily deferred. However, an absolute time is only meaningful if the kernel and hypervisor are operating off the same timebase (ie, no drift). In general, the kernel's monotonic timer is going to start from 0ns when the virtual machine is booted, and the hypervisor's is going to start at 0ns when the hypervisor is booted. If they're operating off the same timebase, then in principle you can work out a constant offset between the two, and use that for converting a kernel absolute time into a hypervisor absolute time. When booting under Xen, you'll get this if you're using both the xen clocksource and clockevent drivers. However, it seems that during boot on a NO_HZ HIGHRES_TIMERS system, the kernel does not use the Xen clocksource until it switches to highres timer mode. This means that during boot the kernel's monotonic clock is drifting with respect to the hypervisor, and all timeouts are unreliable. Initially I was just computing the kernel-hypervisor offset at boot time, but then I changed it to recompute it every time the timer mode changes. However, this didn't really help, and I was still getting unpredictable timeouts during boot. I've changed it to just compute the hypervisor absolute time directly using the delta each time the oneshot timer is set, which will definitely be reliable (if the kernel and hypervisor have drifting timebases then the meaning of Xns delta will be different, but at least thats a local error rather than a long-term cumulative error). My analysis might be wrong here (I suspect the Xen periodic timer may have unexpected behaviour), but the overall conclusion still stands: using an absolute timeout only works if the kernel and hypervisor have non-drifting timebases. I think its too fragile for a clockevent implementation to assume that a particular clocksource is in use to get reliable results. Or perhaps this is a property of the whole clock subsystem: that clockevents must be paired with clocksources. But its not obvious to me that this enforced, or even acknowledged. (Of course, if the drift can be characterized, then you can compensate for it, but this seems too complex to be the right answer. And drift compensation is numerically much simpler for small 32-bit deltas compared to 64-bit absolute times.) J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Sunday 11 March 2007 04:01, James Cloos wrote: > > "Con" == Con Kolivas <[EMAIL PROTECTED]> writes: > > Con> It's sad that sched_yield is still in our graphics card drivers ... > > I just did a recursive grep(1) on my mirror of the freedesktop git > repos for sched_yield. This only checked the master branches as I > did not bother to script up something to clone each, check out all > branches in turn, and grep(1) each possibility. > > The output is just: > :; grep -r sched_yield FDO/xorg > > FDO/xorg/xserver/hw/kdrive/via/viadraw.c: sched_yield(); > FDO/xorg/driver/xf86-video-glint/src/pm2_video.c:if (sync) /* > sched_yield? */ > > Is there something else I should grep(1) for? If not, it looks as > if sched_yield(2) has been evicted from the drivers. See: http://webcvs.freedesktop.org/mesa/Mesa/src/mesa/drivers/dri/r200/r200_ioctl.c?revision=1.37=markup -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem: cat < /dev/my_ttyS0 is not blocked
On Saturday 10 March 2007 13:16, Mockern wrote: > I have a problem with cat < /dev/my_ttyS0 (see strace output below). > cat function is not blocked. I don't understand why it is not stopped > at read(0, __ and terminated? > Thank you Because /dev/my_ttyS0 is probaly a null file. Please show output of 'ls -l /dev/*ttyS*' -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: "Make nenuconfig" does not save parameters.
On Sat, Mar 10, 2007 at 10:34:41PM +0100, Jan Engelhardt wrote: > > On Mar 10 2007 22:27, Sam Ravnborg wrote: > >On Sat, Mar 10, 2007 at 07:23:41PM +0100, Jan Engelhardt wrote: > >> > >> Whether the 'working config file path' should change when you do > >> 'Save as Alternate' or not, is a menuconfig axiom. Ask Sam Ravnborg > >> if you want it changed :-) > > > >Current behaviour is not logical but on the other hand I do not > >see a big need to make it so. > >It seems that people very seldom uses "save alternate" anyway. > > > >But patches are welcome. > > ^_^ The patch has already been posted, has not it? No. Either we keep current behaviour or we change to the "normal" behaviour with a "Save as..." as know from all other programs. Sam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...
On Sat, 10 Mar 2007, Nicholas Miell wrote: > > Care to elaborate on why they're a horrible crock? It's a *classic* case of an interface that tries to do everything under the sun. Here's a clue: look at any system call that takes a union as part of its arguments. Count them. I think we have two: - struct siginfo - struct sigevent and they are both broken horrible interfaces where the data structures depend on various flags. It's just not the UNIX system call way. And none of it really makes sense if you already have a file descriptor, since at that point you know what the notification mechanism is. I'd actually much rather do POSIX timers the other way around: associate a generic notification mechanism with the file descriptor, and then implement posix_timer_create() on top of timerfd. Now THAT sounds like a clean unix-like interface ("everything is a file") and would imply that you'd be able to do the same kind of notification for any file descriptor, not just timers. But posix timers as they are done now are just an abomination. They are not unix-like at all. > And are the bugs fixed? If so, why replace them? They work now. .. but the reason for the bugs was largely a very baroque interface, which didn't get fixed (because it's specified by the standard). I'd rather have straightforward interfaces. The timerfd() one lookedalot more straightforward than posix timers. (That said, using "struct itimerspec" might be a good idea. That would also obviate the need for TFD_TIMER_SEQ, since an itimerspec automatically has both "base" and "incremental" parts). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/6] Rotating Staircase DeadLine scheduler for -mm
On Sunday 11 March 2007 03:53, Nicolas Mailhot wrote: > Le dimanche 11 mars 2007 à 01:03 +1100, Con Kolivas a écrit : > > On Saturday 10 March 2007 22:49, Nicolas Mailhot wrote: > > > Oops > > > > > > ⇒ http://bugzilla.kernel.org/show_bug.cgi?id=8166 > > > > Thanks very much. I can't get your config to boot on qemu, but could you > > please try this debugging patch? It's not a patch you can really run the > > machine with but might find where the problem occurs. Specifically I'm > > looking for the warning MISSING STATIC BIT in your case. > > > > http://ck.kolivas.org/patches/crap/sched-rsdl-0.28-stuff.patch > > I attached a screenshot of the patched kernel boot Thanks. Darn the debugging didn't catch anything. Did you see any BUG during the boot earlier than that screenshot? Probably not. If you have the time I would appreciate you testing 2.6.20 with the rsdl 0.28 patch for it with a config as close to this -mm2 one as possible. http://ck.kolivas.org/patches/staircase-deadline/2.6.20-sched-rsdl-0.28.patch and see if the bug recurs please? Thanks! -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...
On Sat, 10 Mar 2007, Nicholas Miell wrote: > I never complained about one timer per fd (although, now that you > mention it, that would get a bit excessive if you have thousands of > outstanding timers). Right, of course. > > The real-time and monotonic selection can be added. > > IOW, the timerfd patch is not suitable for inclusion as-is. (While > you're at it, you should probably add a flags argument for future > expansion.) That's already in. > > If you look at the posix timers code, that's a bunch of code over the real > > meat of it, that is hrtimer.c. The timerfd interface goes straight to > > that, without adding yet another meaning to the sigevent structure, > > That's what the sigevent structure is for -- to describe how events > should be signaled to userspace, whether by signal delivery, thread > creation, or queuing to event completion ports. If if you think > extending it would be bad, I can show you the line in POSIX where it > encourages the contrary. I'm sorry, I already explained you that linking the two (files and posix timers) is going to create more troubles than it actually solves. The timerfd code provides the same functionality, with zero intrusion in existing code, and basically zero code (once if you remove the usual fd creation/cleanup). The code of adding posix timers support would be *all* the existing one (that is already a thin wrapper that calls hrtimer.c support - like posix timers do), plus adding more crud into the posix timers code, plus adding file references handling. If *you* want to do that, I can open you a door into the timerfd. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Software Suspend: Fix suspend when console is in VT_AUTO/KD_GRAPHICS mode
Hi1 > > It should explain why it is okay to proceed when we can't change to > > text console. > > > See updated comment in attached patch. It's really up to the caller to > decide what to do if we can't switch the console - currently all callers > ignore the return code so I assume that it's okay to proceed anyway. Ok, I guess the patch is right thing to do after all. Fix issues below, append a changelog, and send a patch to lmkl, cc Andrew Morton and me. Oh and you have my ACK. Pavel > Signed-off-by: Andrew Johnson <[EMAIL PROTECTED]> > --- > diff -rup linux-2.6.20.1/drivers/char/vt.c linux/drivers/char/vt.c > --- linux-2.6.20.1/drivers/char/vt.c 2007-02-19 22:34:32.0 -0800 > +++ linux/drivers/char/vt.c 2007-03-09 15:48:29.0 -0800 > @@ -2188,10 +2188,30 @@ static void console_callback(struct work > release_console_sem(); > } > > -void set_console(int nr) > +extern char vt_dont_switch; > + > +int set_console(int nr) > { > + struct vc_data *vc = vc_cons[fg_console].d; > + > + if(!vc_cons_allocated(nr) || vt_dont_switch || there should be space between "if" and "(". > diff -rup linux-2.6.20.1/drivers/char/vt_ioctl.c > linux/drivers/char/vt_ioctl.c > --- linux-2.6.20.1/drivers/char/vt_ioctl.c2007-02-19 22:34:32.0 > -0800 > +++ linux/drivers/char/vt_ioctl.c 2007-03-08 14:15:41.0 And your mailer still wordwraps. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: RSDL v0.28 for 2.6.20
On 3/10/07, Willy Tarreau <[EMAIL PROTECTED]> wrote: On Sat, Mar 10, 2007 at 04:56:57PM -0500, michael chang wrote: > On 3/10/07, Willy Tarreau <[EMAIL PROTECTED]> wrote: > >BTW, Con, I think that you should base your work on 2.6.20.[23] and not > >2.6.20 next time, due to this conflict. It will get wider adoption. ^^ > Maybe I'm naive, but I find this hard to understand -- 2.6.20.2 didn't > exist when Con published his patch. (Con published it ~12 hours before > the release of 2.6.20.2, from what I can tell.) How can he base his > work on something that didn't yet exist? (And it applied cleanly to > 2.6.20.1, the latest when he published it.) You see the words I have underlined ? "next time". I know for sure he published it before 2.6.20.2, but now that it is out, I suggested that Con rebases his work on this version for new releases. Oh. That's my mistake, then. That makes sense. To me, it sounded like you were implying he was supposed to base it on 2.6.20.2 in advance, for some reason. *sigh* -- ~Mike - Just the crazy copy cat. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Delete unused header file.
Delete apparently unused header file include/linux/elfnote.h. Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]> --- not sure who's responsible for this. diff --git a/include/linux/elfnote.h b/include/linux/elfnote.h deleted file mode 100644 index 67396db..000 --- a/include/linux/elfnote.h +++ /dev/null @@ -1,90 +0,0 @@ -#ifndef _LINUX_ELFNOTE_H -#define _LINUX_ELFNOTE_H -/* - * Helper macros to generate ELF Note structures, which are put into a - * PT_NOTE segment of the final vmlinux image. These are useful for - * including name-value pairs of metadata into the kernel binary (or - * modules?) for use by external programs. - * - * Each note has three parts: a name, a type and a desc. The name is - * intended to distinguish the note's originator, so it would be a - * company, project, subsystem, etc; it must be in a suitable form for - * use in a section name. The type is an integer which is used to tag - * the data, and is considered to be within the "name" namespace (so - * "FooCo"'s type 42 is distinct from "BarProj"'s type 42). The - * "desc" field is the actual data. There are no constraints on the - * desc field's contents, though typically they're fairly small. - * - * All notes from a given NAME are put into a section named - * .note.NAME. When the kernel image is finally linked, all the notes - * are packed into a single .notes section, which is mapped into the - * PT_NOTE segment. Because notes for a given name are grouped into - * the same section, they'll all be adjacent the output file. - * - * This file defines macros for both C and assembler use. Their - * syntax is slightly different, but they're semantically similar. - * - * See the ELF specification for more detail about ELF notes. - */ - -#ifdef __ASSEMBLER__ -/* - * Generate a structure with the same shape as Elf{32,64}_Nhdr (which - * turn out to be the same size and shape), followed by the name and - * desc data with appropriate padding. The 'desctype' argument is the - * assembler pseudo op defining the type of the data e.g. .asciz while - * 'descdata' is the data itself e.g. "hello, world". - * - * e.g. ELFNOTE(XYZCo, 42, .asciz, "forty-two") - * ELFNOTE(XYZCo, 12, .long, 0xdeadbeef) - */ -#define ELFNOTE(name, type, desctype, descdata)\ -.pushsection .note.name; \ - .align 4 ; \ - .long 2f - 1f/* namesz */; \ - .long 4f - 3f/* descsz */; \ - .long type ; \ -1:.asciz "name"; \ -2:.align 4 ; \ -3:desctype descdata; \ -4:.align 4 ; \ -.popsection; -#else /* !__ASSEMBLER__ */ -#include -/* - * Use an anonymous structure which matches the shape of - * Elf{32,64}_Nhdr, but includes the name and desc data. The size and - * type of name and desc depend on the macro arguments. "name" must - * be a literal string, and "desc" must be passed by value. You may - * only define one note per line, since __LINE__ is used to generate - * unique symbols. - */ -#define _ELFNOTE_PASTE(a,b)a##b -#define _ELFNOTE(size, name, unique, type, desc) \ - static const struct { \ - struct elf##size##_note _nhdr; \ - unsigned char _name[sizeof(name)] \ - __attribute__((aligned(sizeof(Elf##size##_Word; \ - typeof(desc) _desc \ - __attribute__((aligned(sizeof(Elf##size##_Word; \ - } _ELFNOTE_PASTE(_note_, unique)\ - __attribute_used__ \ - __attribute__((section(".note." name), \ - aligned(sizeof(Elf##size##_Word)), \ - unused)) = { \ - { \ - sizeof(name), \ - sizeof(desc), \ - type, \ - }, \ - name, \ - desc\ - } -#define ELFNOTE(size, name, type, desc)\ - _ELFNOTE(size, name, __LINE__, type, desc) - -#define ELFNOTE32(name, type, desc) ELFNOTE(32, name, type, desc) -#define ELFNOTE64(name, type, desc) ELFNOTE(64, name, type, desc) -#endif /* __ASSEMBLER__ */ - -#endif /* _LINUX_ELFNOTE_H */ --
Re: [PATCH] Software Suspend: Fix suspend when console is in VT_AUTO/KD_GRAPHICS mode
Hi! > > ...how does qpe know when to repaint the screen, anyway? > > QPE doesn't need to repaint the screen after wake-up - the framebuffer > memory is retained so the PXA270 lcd controller simply displays what was > last on the screen when it is re-enabled. That probably means QPE is broken on machines that do not preserve framebuffer over suspend :-(. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] rtc: Add RTC class driver for the Maxim MAX6900
From: Dale Farnsworth <[EMAIL PROTECTED]> Signed-off-by: Dale Farnsworth.org <[EMAIL PROTECTED] --- drivers/rtc/Kconfig | 10 + drivers/rtc/Makefile |1 drivers/rtc/rtc-max6900.c | 312 3 files changed, 323 insertions(+) Index: linux-2.6-powerpc-df/drivers/rtc/Kconfig === --- linux-2.6-powerpc-df.orig/drivers/rtc/Kconfig +++ linux-2.6-powerpc-df/drivers/rtc/Kconfig @@ -334,6 +334,16 @@ config RTC_DRV_TEST This driver can also be built as a module. If so, the module will be called rtc-test. +config RTC_DRV_MAX6900 + tristate "Maxim 6900" + depends on RTC_CLASS && I2C + help + If you say yes here you will get support for the + Maxim MAX6900 I2C RTC chip. + + This driver can also be built as a module. If so, the module + will be called rtc-max6900. + config RTC_DRV_MAX6902 tristate "Maxim 6902" depends on RTC_CLASS && SPI Index: linux-2.6-powerpc-df/drivers/rtc/Makefile === --- linux-2.6-powerpc-df.orig/drivers/rtc/Makefile +++ linux-2.6-powerpc-df/drivers/rtc/Makefile @@ -34,6 +34,7 @@ obj-$(CONFIG_RTC_DRV_EP93XX) += rtc-ep93 obj-$(CONFIG_RTC_DRV_SA1100) += rtc-sa1100.o obj-$(CONFIG_RTC_DRV_VR41XX) += rtc-vr41xx.o obj-$(CONFIG_RTC_DRV_PL031)+= rtc-pl031.o +obj-$(CONFIG_RTC_DRV_MAX6900) += rtc-max6900.o obj-$(CONFIG_RTC_DRV_MAX6902) += rtc-max6902.o obj-$(CONFIG_RTC_DRV_V3020)+= rtc-v3020.o obj-$(CONFIG_RTC_DRV_AT91RM9200)+= rtc-at91rm9200.o Index: linux-2.6-powerpc-df/drivers/rtc/rtc-max6900.c === --- /dev/null +++ linux-2.6-powerpc-df/drivers/rtc/rtc-max6900.c @@ -0,0 +1,312 @@ +/* + * rtc class driver for the Maxim MAX6900 chip + * + * Author: Dale Farnsworth <[EMAIL PROTECTED]> + * + * based on previously existing rtc class drivers + * + * 2007 (c) MontaVista, Software, Inc. This file is licensed under + * the terms of the GNU General Public License version 2. This program + * is licensed "as is" without any warranty of any kind, whether express + * or implied. + */ + +#include +#include +#include +#include +#include + +#define DRV_NAME "max6900" +#define DRV_VERSION "0.1" + +/* + * register indices + */ +#define MAX6900_REG_SC 0 /* seconds 00-59 */ +#define MAX6900_REG_MN 1 /* minutes 00-59 */ +#define MAX6900_REG_HR 2 /* hours00-23 */ +#define MAX6900_REG_DT 3 /* day of month 00-31 */ +#define MAX6900_REG_MO 4 /* month01-12 */ +#define MAX6900_REG_DW 5 /* day of week 1-7 */ +#define MAX6900_REG_YR 6 /* year 00-99 */ +#define MAX6900_REG_CT 7 /* control */ +#define MAX6900_REG_LEN8 + +#define MAX6900_REG_CT_WP (1 << 7)/* Write Protect */ + +/* + * register read/write commands + */ +#define MAX6900_REG_CONTROL_WRITE 0x8e +#define MAX6900_REG_BURST_READ 0xbf +#define MAX6900_REG_BURST_WRITE0xbe +#define MAX6900_REG_RESERVED_READ 0x96 + +#define MAX6900_IDLE_TIME_AFTER_WRITE 3 /* specification says 2.5 mS */ + +#define MAX6900_I2C_ADDR 0xa0 + +static unsigned short normal_i2c[] = { + MAX6900_I2C_ADDR >> 1, + I2C_CLIENT_END +}; + +I2C_CLIENT_INSMOD; /* defines addr_data */ + +static int max6900_probe(struct i2c_adapter *adapter, int addr, int kind); + +static int max6900_i2c_read_regs(struct i2c_client *client, u8 *buf) +{ + u8 reg_addr[1] = { MAX6900_REG_BURST_READ }; + struct i2c_msg msgs[2] = { + { + client->addr, + 0, /* write */ + sizeof(reg_addr), + reg_addr + }, + { + client->addr, + I2C_M_RD, + MAX6900_REG_LEN, + buf + } + }; + int rc; + + rc = i2c_transfer(client->adapter, msgs, ARRAY_SIZE(msgs)); + if (rc != ARRAY_SIZE(msgs)) { + dev_err(>dev, "%s: register read failed\n", + __FUNCTION__); + return -EIO; + } + return 0; +} + +static int max6900_i2c_write_regs(struct i2c_client *client, u8 const *buf) +{ + u8 i2c_buf[MAX6900_REG_LEN + 1] = { MAX6900_REG_BURST_WRITE }; + struct i2c_msg msgs[1] = { + { + client->addr, + 0, /* write */ + MAX6900_REG_LEN + 1, + i2c_buf + } + }; + int rc; + + memcpy(_buf[1], buf, MAX6900_REG_LEN); + +
[PATCH] swsusp: Fix resume error path in platform mode
From: Rafael J. Wysocki <[EMAIL PROTECTED]> If swsusp is using the platform mode during the resume and the image cannot be read, the platform mode should be switched off before software_resume() returns. Make it happen. Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]> Acked-by: Pavel Machek <[EMAIL PROTECTED]> --- kernel/power/disk.c |1 + 1 file changed, 1 insertion(+) Index: linux-2.6.21-rc3/kernel/power/disk.c === --- linux-2.6.21-rc3.orig/kernel/power/disk.c +++ linux-2.6.21-rc3/kernel/power/disk.c @@ -260,6 +260,7 @@ static int software_resume(void) error = swsusp_read(); if (error) { swsusp_free(); + platform_finish(); goto Thaw; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] CRIS: Delete unused header file.
Delete apparently unused header file drivers/serial/crisv10.h. Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]> --- diff --git a/drivers/serial/crisv10.h b/drivers/serial/crisv10.h deleted file mode 100644 index 4a23340..000 --- a/drivers/serial/crisv10.h +++ /dev/null @@ -1,136 +0,0 @@ -/* - * serial.h: Arch-dep definitions for the Etrax100 serial driver. - * - * Copyright (C) 1998, 1999, 2000 Axis Communications AB - */ - -#ifndef _ETRAX_SERIAL_H -#define _ETRAX_SERIAL_H - -#include -#include - -/* Software state per channel */ - -#ifdef __KERNEL__ -/* - * This is our internal structure for each serial port's state. - * - * Many fields are paralleled by the structure used by the serial_struct - * structure. - * - * For definitions of the flags field, see tty.h - */ - -#define SERIAL_RECV_DESCRIPTORS 8 - -struct etrax_recv_buffer { - struct etrax_recv_buffer *next; - unsigned short length; - unsigned char error; - unsigned char pad; - - unsigned char buffer[0]; -}; - -struct e100_serial { - int baud; - volatile u8 *port; /* R_SERIALx_CTRL */ - u32 irq; /* bitnr in R_IRQ_MASK2 for dmaX_descr */ - - /* Output registers */ - volatile u8 *oclrintradr; /* adr to R_DMA_CHx_CLR_INTR */ - volatile u32*ofirstadr; /* adr to R_DMA_CHx_FIRST */ - volatile u8 *ocmdadr; /* adr to R_DMA_CHx_CMD */ - const volatile u8 *ostatusadr; /* adr to R_DMA_CHx_STATUS */ - - /* Input registers */ - volatile u8 *iclrintradr; /* adr to R_DMA_CHx_CLR_INTR */ - volatile u32*ifirstadr; /* adr to R_DMA_CHx_FIRST */ - volatile u8 *icmdadr; /* adr to R_DMA_CHx_CMD */ - volatile u32*idescradr; /* adr to R_DMA_CHx_DESCR */ - - int flags; /* defined in tty.h */ - - u8 rx_ctrl; /* shadow for R_SERIALx_REC_CTRL */ - u8 tx_ctrl; /* shadow for R_SERIALx_TR_CTRL */ - u8 iseteop; /* bit number for R_SET_EOP for the input dma */ - int enabled; /* Set to 1 if the port is enabled in HW config */ - - u8 dma_out_enabled:1; /* Set to 1 if DMA should be used */ - u8 dma_in_enabled:1; /* Set to 1 if DMA should be used */ - - /* end of fields defined in rs_table[] in .c-file */ - u8 uses_dma_in; /* Set to 1 if DMA is used */ - u8 uses_dma_out; /* Set to 1 if DMA is used */ - u8 forced_eop; /* a fifo eop has been forced */ - int baud_base; /* For special baudrates */ - int custom_divisor; /* For special baudrates */ - struct etrax_dma_descr tr_descr; - struct etrax_dma_descr rec_descr[SERIAL_RECV_DESCRIPTORS]; - int cur_rec_descr; - - volatile inttr_running; /* 1 if output is running */ - - struct tty_struct *tty; - int read_status_mask; - int ignore_status_mask; - int x_char; /* xon/xoff character */ - int close_delay; - unsigned short closing_wait; - unsigned short closing_wait2; - unsigned long event; - unsigned long last_active; - int line; - int type; /* PORT_ETRAX */ - int count; /* # of fd on device */ - int blocked_open; /* # of blocked opens */ - struct circ_buf xmit; - struct etrax_recv_buffer *first_recv_buffer; - struct etrax_recv_buffer *last_recv_buffer; - unsigned intrecv_cnt; - unsigned intmax_recv_cnt; - - struct work_struct work; - struct async_icount icount; /* error-statistics etc.*/ - struct ktermios normal_termios; - struct ktermios callout_termios; -#ifdef DECLARE_WAITQUEUE - wait_queue_head_t open_wait; - wait_queue_head_t close_wait; -#else - struct wait_queue *open_wait; - struct wait_queue *close_wait; -#endif - - unsigned long char_time_usec; /* The time for 1 char, in usecs */ - unsigned long flush_time_usec; /* How often we should flush */ - unsigned long last_tx_active_usec; /* Last tx usec in the jiffies */ - unsigned long last_tx_active; /* Last tx time in jiffies */ - unsigned long last_rx_active_usec; /* Last rx usec in the jiffies */ - unsigned long last_rx_active; /* Last rx time in jiffies */ - - int