2.4.3 fails to boot with initrd - solved
PROBLEM: kernel 2.4.3 will not boot on systems with initrd files DESCRIPTION Building kernel 2.4.3 and attempting to boot it failed. The problem turned out to be in the modutils-2.4.5 rpm for i386. DETAIL After building the 2.4.3 kernel and moving the boot modules to the initrd image, it was noted the the system stopped when trying to load modules for the root filesystem device. First solution attempted was to get the i386 rpm from kernel.org for the latest (2.4.5) modutils and install, copying the insmod program to the initrd image. This fails, with the message "insmod: no such program" at boot. Examination showed that the binary provided was not static linked. Got the source from kernel.org and built. By default this still isn't static linked! Changed the common Makfile to set LDFLAGS to "-static -s" and built again. After install and copy to initrd image this resulted in a bootable system. While it is possible to copy the libraries needed to the initrd image, it becomes larger than the default ramdisk size (at least on my system). And including the drivers in the kernel hurts portability and makes the kernel too large to boot from floppy. SYSTEMS AFFECTED Redhat 7.x and similar using configurations which have the root device driver loaded from modules. SUGGESTED FIX None needed, but the kernel "Changes" file should include a note that people using initrd will need to rebuild them static along with the note that a newer modutils is needed. Even for people who build their own initrd files, this is NOT obvious! -- bill davidsen [EMAIL PROTECTED] CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Documentation glitch in 2.4
The Config help for kernel automount indicates that the pointer to user code is in the Documentation/Changes file for autofs. As far as I can tell that isn't the case. Since search engines seem to be better at finding the BSD and 2.2 software, it would be nice if the information was restored with all the other "get it here info." -- bill davidsen [EMAIL PROTECTED] CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield proposals/rationale
Mark Lord wrote: [EMAIL PROTECTED] wrote: From: Bill Davidsen And having gotten same, are you going to code up what appears to be a solution, based on this feedback? The feedback was helpful in verifying whether there are any arguments against my approach. The real proof is in the pudding. I'm running a kernel with these changes, as we speak. Overall system throughput is about up 20%. With 'system throughput' I mean measured performance of a rather large (experimental) system. The patch isn't even 24h old... Also the application latency has improved. Cool. You *do know* that there is a brand new CPU scheduler scheduled to replace the current one for the 2.6.22 Kernel, right? Having tried both nicksched and Con's fair sched on some normal loads, as opposed to benchmarks, I sure hope Linus changes his mind about having several schedulers in the kernel. The one perfect and self-adjusting scheduler isn't here yet. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RFC] Kill off legacy power management stuff.
Rafael J. Wysocki wrote: [appropriate CCs added] On Friday, 13 April 2007 02:33, Robert P. J. Day wrote: just something i threw together, not in final form, but it represents tossing the legacy PM stuff. at the moment, the menuconfig entry for PM_LEGACY lists it as DEPRECATED, while the help screen calls it obsolete. that's a good sign that it's getting close to the time for it to go, and the removal is fairly straightforward, but there's no mention of its removal in the feature removal schedule file. It's been like this for a long long time. I think you're right that it can be dropped, but I don't know the details (eg. why it hasn't been dropped yet). One reason was that there are (were?) a number of machines which only powered down properly using apm. It was discussed as part of shutting down after power failure when your UPS is running out of power. I haven't checked on that in a while, I'm just supplying one reason since you wondered. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RFC] Kill off legacy power management stuff.
Robert P. J. Day wrote: On Tue, 17 Apr 2007, Bill Davidsen wrote: Rafael J. Wysocki wrote: [appropriate CCs added] On Friday, 13 April 2007 02:33, Robert P. J. Day wrote: just something i threw together, not in final form, but it represents tossing the legacy PM stuff. at the moment, the menuconfig entry for PM_LEGACY lists it as DEPRECATED, while the help screen calls it obsolete. that's a good sign that it's getting close to the time for it to go, and the removal is fairly straightforward, but there's no mention of its removal in the feature removal schedule file. It's been like this for a long long time. I think you're right that it can be dropped, but I don't know the details (eg. why it hasn't been dropped yet). One reason was that there are (were?) a number of machines which only powered down properly using apm. It was discussed as part of shutting down after power failure when your UPS is running out of power. um ... what does APM have to do with legacy PM? two different issues, no? Since the patches are going into apm.c and apm was used for suspend and poweroff before ACPI was a feature of the hardware, I assume there's a relationship. As of 2.6.9 ACPI still couldn't power down one of my old boxes, it hasn't been updated since that time, so I can't say what later kernels will do. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kaffeine problem with CFS
S.Çag(lar Onur wrote: 18 Nis 2007 Çar tarihinde, Ingo Molnar s,unlar? yazm?s,t?: * S.Çag(lar Onur [EMAIL PROTECTED] wrote: - schedule(); + msleep(1); which Ingo sends me to try also has the same effect on me. I cannot reproduce hangs anymore with that patch applied top of CFS while one console checks out SVN repos and other one compiles a small test software. great! Could you please unapply the hack above and try the proper fix below, does this one solve the hangs too? Instead of that one, i tried CFSv3 and i cannot reproduce the hang anymore, Thanks!... And that explains why CFS-v3 on 21-rc7-git3 wouldn't show me the hang. As a matter of fact, nothing I did showed any bad behavior! Note that I was doing actual badly behaved things which do sometimes glitch the standard scheduler, not running benchmarks. This scheduler is boring, everything works. I am going to try some tests on a uniprocessor, though, I have been running everything on either SMP or HT CPUs. But so far it looks fine. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Mike Galbraith wrote: On Tue, 2007-04-17 at 05:40 +0200, Nick Piggin wrote: On Tue, Apr 17, 2007 at 04:29:01AM +0200, Mike Galbraith wrote: Yup, and progress _is_ happening now, quite rapidly. Progress as in progress on Ingo's scheduler. I still don't know how we'd decide when to replace the mainline scheduler or with what. I don't think we can say Ingo's is better than the alternatives, can we? No, that would require massive performance testing of all alternatives. If there is some kind of bakeoff, then I'd like one of Con's designs to be involved, and mine, and Peter's... The trouble with a bakeoff is that it's pretty darn hard to get people to test in the first place, and then comes weighting the subjective and hard performance numbers. If they're close in numbers, do you go with the one which starts the least flamewars or what? Here we disagree... I picked a scheduler not by running benchmarks, but by running loads which piss me off with the mainline scheduler. And then I ran the other schedulers for a while to find the things, normal things I do, which resulted in bad behavior. And when I found one which had (so far) no such cases I called it my winner, but I haven't tested it under server load, so I can't begin to say it's the best. What we need is for lots of people to run every scheduler in real life, and do worst case analysis by finding the cases which cause bad behavior. And if there were a way to easily choose another scheduler, call it plugable, modular, or Russian Roulette, people who found a worst case would report it (aka bitch about it) and try another. But the average user is better able to boot with an option like sched=cfs (or sc, or nick, or ...) than to patch and build a kernel. So if we don't get easily switched schedulers people will not test nearly as well. The best scheduler isn't the one 2% faster than the rest, it's the one with the fewest jackpot cases where it sucks. And if the mainline had multiple schedulers this testing would get done, authors would get more reports and have a better chance of fixing corner cases. Note that we really need multiple schedulers to make people happy, because fairness is not the most desirable behavior on all machines, and adding knobs probably isn't the answer. I want a server to degrade gently, I want my desktop to show my movie and echo my typing, and if that's hard on compiles or the file transfer, so be it. Con doesn't want to compromise his goals, I agree but want to have an option if I don't share them. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Ingo Molnar wrote: ( Lets be cautious though: the jury is still out whether people actually like this more than the current approach. While CFS feedback looks promising after a whopping 3 days of it being released [ ;-) ], the test coverage of all 'fairness centric' schedulers, even considering years of availability is less than 1% i'm afraid, and that 1% was mostly self-selecting. ) All of my testing has been on desktop machines, although in most cases they were really loaded desktops which had load avg 10..100 from time to time, and none were low memory machines. Up to CFS v3 I thought nicksched was my winner, now CFSv3 looks better, by not having stumbles under stupid loads. I have not tested: 1 - server loads, nntp, smtp, etc 2 - low memory machines 3 - uniprocessor systems I think this should be done before drawing conclusions. Or if someone has tried this, perhaps they would report what they saw. People are talking about smoothness, but not how many pages per second come out of their overloaded web server. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Matt Mackall wrote: On Wed, Apr 18, 2007 at 08:37:11AM +0200, Nick Piggin wrote: [2] It's trivial to construct two or more perfectly reasonable and desirable definitions of fairness that are mutually incompatible. Probably not if you use common sense, and in the context of a replacement for the 2.6 scheduler. Ok, trivial example. You cannot allocate equal CPU time to processes/tasks and simultaneously allocate equal time to thread groups. Is it common sense that a heavily-threaded app should be able to get hugely more CPU than a well-written app? No. I don't want Joe's stupid Java app to make my compile crawl. On the other hand, if my heavily threaded app is, say, a voicemail server serving 30 customers, I probably want it to get 30x the CPU of my gzip job. Matt, you tickled a thought... on one hand we have a single user running a threaded application, and it ideally should get the same total CPU as a user running a single thread process. On the other hand we have a threaded application, call it sendmail, nnrpd, httpd, bind, whatever. In that case each thread is really providing service for an independent user, and should get an appropriate share of the CPU. Perhaps the solution is to add a means for identifying server processes, by capability, or by membership in a server group, or by having the initiating process set some flag at exec() time. That doesn't necessarily solve problems, but it may provide more information to allow them to be soluble. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Linus Torvalds wrote: On Wed, 18 Apr 2007, Matt Mackall wrote: Why is X special? Because it does work on behalf of other processes? Lots of things do this. Perhaps a scheduler should focus entirely on the implicit and directed wakeup matrix and optimizing that instead[1]. I 100% agree - the perfect scheduler would indeed take into account where the wakeups come from, and try to weigh processes that help other processes make progress more. That would naturally give server processes more CPU power, because they help others I don't believe for a second that fairness means give everybody the same amount of CPU. That's a totally illogical measure of fairness. All processes are _not_ created equal. That said, even trying to do fairness by effective user ID would probably already do a lot. In a desktop environment, X would get as much CPU time as the user processes, simply because it's in a different protection domain (and that's really what effective user ID means: it's not about users, it's really about protection domains). And fairness by euid is probably a hell of a lot easier to do than trying to figure out the wakeup matrix. You probably want to consider the controlling terminal as well... do you want to have people starting 'at' jobs competing on equal footing with people typing at a terminal? I'm not offering an answer, just raising the question. And for some database applications, everyone in a group may connect with the same login-id, then do sub authorization to the database application. euid may be an issue there as well. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Ingo Molnar wrote: * Davide Libenzi [EMAIL PROTECTED] wrote: The same user nicing two different multi-threaded processes would expect a predictable CPU distribution too. [...] i disagree that the user 'would expect' this. Some users might. Others would say: 'my 10-thread rendering engine is more important than a 1-thread job because it's using 10 threads for a reason'. And the CFS feedback so far strengthens this point: the default behavior of treating the thread as a single scheduling (and CPU time accounting) unit works pretty well on the desktop. If by desktop you mean one and only one interactive user, that's true. On a shared machine it's very hard to preserve any semblance of fairness when one user gets far more than another, based not on the value of what they're doing but the tools they use to to it. think about it in another, 'kernel policy' way as well: we'd like to _encourage_ more parallel user applications. Hurting them by accounting all threads together sends the exact opposite message. Why is that? There are lots of things which are intrinsically single threaded, how are we hurting hurting multi-threaded applications by refusing to give them more CPU than an application running on behalf of another user? By accounting all threads together we encourage writing an application in the most logical way. Threads are a solution, not a goal in themselves. [...] Doing that efficently (the old per-cpu run-queue is pretty nice from many POVs) is the real challenge. yeah. Ingo -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] another scheduler beater
The small attached script does a nice job of showing animation glitches in the glxgears animation. I have run one set of tests, and will have several more tomorrow. I'm off to a poker game, and would like to let people draw their own conclusions. Based on just this script as load I would say renice on X isn't a good thing. Based on one small test, I would say that renice of X in conjunction with heavy disk i/o and a single fast scrolling xterm (think kernel compile) seems to slow the raid6 thread measurably. Results late tomorrow, it will be an early and long day :-( glitch1.sh Description: Bourne shell script
[REPORT] First glitch1 results, 2.6.21-rc7-git6-CFSv5
I am not sure a binary attachment will go thru, I will move to the web ste if not. GL2.6.21-rc7-git6-CFSv5_nice0_jump Description: Binary data GL2.6.21-rc7-git6-CFSv5_nice0_nojump Description: Binary data GL2.6.21-rc7-git6-CFSv5_nice19_nojump Description: Binary data GL2.6.21-rc7-git6-CFSv5_nice-19_jump Description: Binary data
Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
Linus Torvalds wrote: On Mon, 5 Mar 2007, Ed Tomlinson wrote: The patch _does_ make a difference. For instance reading mail with freenet working hard (threaded java application) and gentoo's emerge triggering compiles to update the box is much smoother. Think this scheduler needs serious looking at. I agree, partly because it's obviously been getting rave reviews so far, but mainly because it looks like you can think about behaviour a lot better, something that was always very hard with the interactivity boosters with process state history. I'm not at all opposed to this, but we do need: - to not do it at this stage in the stable kernel - to let it sit in -mm for at least a short while - and generally more people testing more loads. Please, could you now rethink plugable scheduler as well? Even if one had to be chosen at boot time and couldn't be change thereafter, it would still allow a few new thoughts to be included. I don't actually worry too much about switching out a CPU scheduler: those things are places where you *can* largely read the source code and get an idea for them (although with the kind of history state that we currently have, it's really really hard). But at the very least they aren't likely to have subtle bugs that show up elsewhere, so... I confess that the default scheduler works for me most of the time, i/o tuning is more productive. I want tot test with kvm load, but 2.6.21-rc3-git3 doesn't want to run kvm at all, I'm looking to see what I broke, since nbd doesn't work, either. I'm collecting OOPS now, will forward when I have a few more. So as long as the generic concerns above are under control, I'll happily try something like this if it can be merged early in a merge window.. Linus -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
Con Kolivas wrote: On Wednesday 07 March 2007 04:50, Bill Davidsen wrote: With luck I'll get to shake out that patch in combination with kvm later today. Great thanks!. I've appreciated all the feedback so far. I did try, the 2.6.21-rc3-git3 doesn't want to kvm for me, and your patch may not be doing what it should. I'm falling back to 2.6.20 and will retest after I document my kvm issues. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Question: schedule()
albcamus wrote: your kthread IS preemptible unless you call preempt_disable or some locking functions explicitly . I think he's trying to go the other way, make his thread the highest priority to blow anything else in the system out of the water. See his previous post how to make kernel thread more faster? -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
Linus Torvalds wrote: On Thu, 8 Mar 2007, Bill Davidsen wrote: Please, could you now rethink plugable scheduler as well? Even if one had to be chosen at boot time and couldn't be change thereafter, it would still allow a few new thoughts to be included. No. Really. I absolutely *detest* pluggable schedulers. They have a huge downside: they allow people to think that it's ok to make special-case schedulers. But it IS okay for people to make special-case schedulers. Because it's MY machine, and how it behaves under mixed load is not a technical issue, it's a POLICY issue, and therefore the only way you can allow the admin to implement that policy is to either provide several schedulers or to provide all sorts of tunable knobs. And by having a few schedulers which have been heavily tested and reviewed, you can define the policy the scheduler implements and document it. Instead of people writing their own, or hacking the code, they could have a few well-tested choices, with known policy goals. And I simply very fundamentally disagree. If you want to play with a scheduler of your own, go wild. It's easy (well, you'll find out that getting good results isn't, but that's a different thing). But actual pluggable schedulers just cause people to think that oh, the scheduler performs badly under circumstance X, so let's tell people to use special scheduler Y for that case. And has that been a problem with io schedulers? I don't see any vast proliferation of them, I don't see contentious exchanges on LKML, or people asking how to get yet another into mainline. In fact, I would say that the io scheduler situation is as right as anything can be, choices for special cases, lack of requests for something else. And CPU scheduling really isn't that complicated. It's *way* simpler than IO scheduling. There simply is *no*excuse* for not trying to do it well enough for all cases, or for having special-case stuff. This supposes that the desired behavior, the policy, is the same on all machines or that there is currently a way to set the target. If I want interactive response with no consideration to batch (and can't trust users to use nice), I want one policy. If I want a compromise, the current scheduler or RSDL are candidates, but they do different things. But even IO scheduling actually ends up being largely the same. Yes, we have pluggable schedulers, and we even allow switching them, but in the end, we don't want people to actually do it. It's much better to have a scheduler that is good enough than it is to have five that are perfect for five particular cases. We not only have multiple io schedulers, we have many tunable io parameters, all of which allow people to make their system behave the way they think is best. It isn't causing complaint, confusion, or instability. We have many people requesting a different scheduler, so obviously what we have isn't good enough and I doubt any one scheduler can be, given that the target behavior is driven by non-technical choices. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fix read past end of array in md/linear.c
Andy Isaacson wrote: When iterating through an array, one must be careful to test one's index variable rather than another similarly-named variable. The loop will read off the end of conf-disks[] in the following (pathological) case: % dd bs=1 seek=840716287 if=/dev/zero of=d1 count=1 % for i in 2 3 4; do dd if=/dev/zero of=d$i bs=1k count=$(($i+150)); done % ./vmlinux ubd0=root ubd1=d1 ubd2=d2 ubd3=d3 ubd4=d4 # mdadm -C /dev/md0 --level=linear --raid-devices=4 /dev/ubd[1234] adding some printks, I saw this: [42949374.96] hash_spacing = 821120 [42949374.96] cnt = 4 [42949374.96] min_spacing = 801 [42949374.96] j=0 size=820928 sz=820928 [42949374.96] i=0 sz=820928 hash_spacing=820928 [42949374.96] j=1 size=64 sz=64 [42949374.96] j=2 size=64 sz=128 [42949374.96] j=3 size=64 sz=192 [42949374.96] j=4 size=1515870810 sz=1515871002 Index: linus/drivers/md/linear.c === --- linus.orig/drivers/md/linear.c 2007-03-02 11:35:55.0 -0800 +++ linus/drivers/md/linear.c 2007-03-07 13:10:30.0 -0800 @@ -188,7 +188,7 @@ for (i=0; i cnt-1 ; i++) { sector_t sz = 0; int j; - for (j=i; icnt-1 sz min_spacing ; j++) + for (j=i; jcnt-1 sz min_spacing ; j++) sz += conf-disks[j].size; if (sz = min_spacing sz conf-hash_spacing) conf-hash_spacing = sz; After looking at that code, I have to wonder how this ever worked, or if in fact anyone ever took this path. I assume that the value of sz caused the loop exit in all cases, since this has been in the code at least since 2.6.15, oldest thing I have handy. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PROBLEM] Can't start MD devices by using /dev/disk/by-id
Patrick Ale wrote: Hi chaps, I just came home, rebooted my box to git8 and *gasp* a problem :) I can't start my MD devices anymore by defining /dev/disk/by-id/* devices in /etc/mdadm.conf. When I do a: mdadm --assemble /dev/md/1 it tells me No devices found for /dev/md/1 When I edit the file /etc/mdadm.conf and change the /dev/disk/by-id/* to whatever the symbolic links points to in the /dev directory, it does work. Just out of curiosity, why did you do this in such a manual way instead of just using the UUID? I would think every time you replace a failed drive you would have to go edit the files all over again. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: smp and irq conflict
Benny Amorsen wrote: BD == Bill Davidsen [EMAIL PROTECTED] writes: BD You may be able to move one board to another slot, but looking at BD the bandwidth I suspect you may need a server motherboard with BD multiple busses, preferably running at 66MHz 64bit. I don't think BD this is a interrupt problem, but you can just try capture on two BD channels which share an interrupt, like bttv0 and bttv7 to verify BD that. 66MHz 64bit isn't much fun when the capture cards are 33MHz 32bit. It doesn't help the video to bus, but multiple busses to give a bus per card would help, and assuming the data are being saved to disk using a decent disk controller which can use the additional bandwidth, at least some conflict is avoided or reduced. This is really a case of using general hardware to the utmost, I suspect more m/b bandwidth will be needed somewhere. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AHCI - remove probing of ata2
Greg Trounson wrote: At the risk of sounding like a me too post: I also have an Asus P5W-DH, with the following drives connected: SATA: ST3250820AS, connected to sata1 PATA: HL-DT-ST GSA-H12N, ATAPI DVD Writer, Primary master On bootup of 2.6.19 and 2.6.20, the kernel stalls for 1 minute when probing sata2, eventually giving up and continuing the boot process. There is no physical sata2 connector on the Motherboard, just solder lugs between sata1 and sata3. From other users I understand this is really a Silicon Image SIL4723 SATA to 2-Port SATA splitter. It is detected by the kernel as a disk, as below. The relevant part of the boot process looks like: ... libata version 2.00 loaded. ahci :00:1f.2: version 2.0 ACPI: PCI Interrupt :00:1f.2[B] - GSI 23 (level, low) - IRQ 22 PCI: Setting latency timer of device :00:1f.2 to 64 ahci :00:1f.2: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode ahci :00:1f.2: flags: 64bit ncq led clo pio slum part ata1: SATA max UDMA/133 cmd 0xF882A900 ctl 0x0 bmdma 0x0 irq 219 ata2: SATA max UDMA/133 cmd 0xF882A980 ctl 0x0 bmdma 0x0 irq 219 ata3: SATA max UDMA/133 cmd 0xF882AA00 ctl 0x0 bmdma 0x0 irq 219 ata4: SATA max UDMA/133 cmd 0xF882AA80 ctl 0x0 bmdma 0x0 irq 219 scsi0 : ahci ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: ATA-7, max UDMA/133, 488397168 sectors: LBA48 NCQ (depth 31/32) ata1.00: ata1: dev 0 multi count 16 ata1.00: configured for UDMA/133 scsi1 : ahci ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ...waits 20 seconds... ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x104) ...waits 5 seconds... ata2: port is slow to respond, please be patient (Status 0x80) ...waits 30 seconds... ata2: port failed to respond (30 secs, Status 0x80) ata2: COMRESET failed (device not ready) ata2: hardreset failed, retrying in 5 secs ...waits 5 seconds... ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata2.00: ATA-6, max UDMA/133, 640 sectors: LBA ata2.00: ata2: dev 0 multi count 1 ata2.00: configured for UDMA/133 scsi2 : ahci ata3: SATA link down (SStatus 0 SControl 300) ... A bit of poking about shows: fdisk -l /dev/sdb Disk /dev/sdb: 0 MB, 327680 bytes 255 heads, 63 sectors/track, 0 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdb doesn't contain a valid partition table So it presents itself as a 320k disk, filled with zeroes as below: dd if=/dev/sdb |hexdump 000 * 005 640+0 records in 640+0 records out 327680 bytes (328 kB) copied, 0.0106662 seconds, 30.7 MB/s Note that this is not a fatal error. The machine still boots eventually, but the seemingly mandatory 60 second pause makes startup rather cumbersome for the user. So far none of the suggested fixes have managed to stop ata2 from being detected. (noprobe=ata2, irqpoll, etc). I understand this problem wasn't present in 2.6.16 so the problem must lie in some patch since then. I see Tejun is working towards patches for this and I would be happy to try them here. Is this 320k of cache memory, or in any way some actual storage on the system? Have you tried to write to it out of curiosity? Seems odd that it would be detected if there were nothing at all present, although obviously it could be artifact. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[2.6.20-get13] KVM-12 won't build
Goes out with an error message: cc -I /home/davidsen/downloads/kernel.org/linux-2.6.20-git13/include -MMD -MF ./.kvmctl.d -g -c -o kvmctl.o kvmctl.c kvmctl.c:29:2: error: #error libkvm: userspace and kernel version mismatch make[1]: *** [kvmctl.o] Error 1 I don't see a kvm-13 on the KVM website. -- Bill Davidsen He was a full-time professional cat, not some moonlighting ferret or weasel. He knew about these things. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Excessive dmesg whining in 2.6.20-git13
The good news is that this kernel boots, so I can start testing. However, it seems to have a LOT of trouble coping with the idea that my only IDE device is a DVD burner. I am guessing from the hundreds of lines of nbd whining that nbd doesn't work, testing will continue after I go plow more snow. If all this whining doesn't indicate a problem, I might suggest eliminating it, since it tends to hide any real problem. compressed dmesg and config attached. -- Bill Davidsen He was a full-time professional cat, not some moonlighting ferret or weasel. He knew about these things. config.gz Description: GNU Zip compressed data dmesg-2.6.20-git13.bz2 Description: BZip2 compressed data
Re: [2.6.20-get13] KVM-12 won't build
Joerg Roedel wrote: On Fri, Feb 16, 2007 at 11:32:13AM -0500, Bill Davidsen wrote: Goes out with an error message: cc -I /home/davidsen/downloads/kernel.org/linux-2.6.20-git13/include -MMD -MF ./.kvmctl.d -g -c -o kvmctl.o kvmctl.c kvmctl.c:29:2: error: #error libkvm: userspace and kernel version mismatch make[1]: *** [kvmctl.o] Error 1 I don't see a kvm-13 on the KVM website. You will find the kvm-13 release in the SourceForge download area of KVM[1]. Kvm-12 is still required for 2.6.20 kernels. Joerg [1] http://sourceforge.net/project/showfiles.php?group_id=180599 I'll look, the download off the home page didn't seem to have it. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] DualFS: File System with Meta-data and Data Separation
Jörn Engel wrote: On Thu, 15 February 2007 23:59:14 +0100, Juan Piernas Canovas wrote: Actually, the version of DualFS for Linux 2.4.19 implements a cleaner. In our case, the cleaner is not really a problem because there is not too much to clean (the meta-data device only contains meta-data blocks which are 5-6% of the file system blocks; you do not have to move data blocks). That sounds as if you have not hit the interesting cases yet. Fun starts when your device is near-full and you have a write-intensive workload. In your case, that would be metadata-write-intensive. For one, this is where write performance of log-structured filesystems usually goes down the drain. And worse, it is where the cleaner can run into a deadlock. Being good where log-structured filesystems usually are horrible is a challenge. And I'm sure many people are more interested in those performance number than in the ones you shine at. :) Actually I am interested in the common case, where the machine is not out of space, or memory, or CPU, but when it is appropriately sized to the workload. Not that I lack interest in corner cases, but the running flat out case doesn't reflect case where there's enough hardware, now the o/s needs to use it well. The one high load benchmark I would love to see is a web server, running tux, with a load over a large (number of files) distributed data set. The much faster tar create times posted make me think that a server with a lot of files would benefit, when CPU and memory requirements are not a bottleneck. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] DualFS: File System with Meta-data and Data Separation
Jörn Engel wrote: On Fri, 16 February 2007 18:47:48 -0500, Bill Davidsen wrote: Actually I am interested in the common case, where the machine is not out of space, or memory, or CPU, but when it is appropriately sized to the workload. Not that I lack interest in corner cases, but the running flat out case doesn't reflect case where there's enough hardware, now the o/s needs to use it well. There is one detail about this specific corner case you may be missing. Most log-structured filesystems don't just drop in performance - they can run into a deadlock and the only recovery from this is the lovely backup-mkfs-restore procedure. I missed that. Which corner case did you find triggers this in DualFS? If it was just performance, I would agree with you. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-mm1 - Oops using Minix 3 file system
Cédric Augonnet wrote: 2007/2/15, Andrew Morton [EMAIL PROTECTED]: Temporarily at http://userweb.kernel.org/~akpm/2.6.20-mm1/ Will appear later at Changes since 2.6.20-rc6-mm3: -minix-v3-support.patch Hi Daniel, On 2.6.20-rc6-mm3 and 2.6.20-mm1, i get an OOPS when using the minix 3 file system. I enclose the dmesg and the .config to that mail. Here are the steps to reproduce this oops (they involve using qemu to run Minix 3) - First create a 2GB image using qemu-img create minix.img 2G (Please note that this seem to be producing an eroneous image) - Then launch Minix inside qemu to make a minix partition on this image using mkfs on the corresponding device. That's two steps, right? First you make a partition on the disk qemu provides, then you put a filesystem on the partition? Or did you put a filesystem on the raw device? - Mount the image on loopback using mount -t minix -o loop minix.img /mnt/qemu/ Does mount know to use Minux3 with this command line? - issue a df command on /mnt/qemu This oops occurs everytime i use df on this directory. However, this does not occur if the image was for a 1MB partition. And it does not occur if the partition on which we created minix.img was the same as the partition on which qemu stands. Sounds like qemu has an issue and creates an erroneous partition which linux does not handle correctly. Regards, and thanks for your patch by the way ! Having been burned a few times by the fact that qemu provides disk images which then (normally) get partitions, I'm not sure you aren't having the same problem. None of which justifies the OOPS, of course, nice kernels don't go down. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] Time for a linux-kvm mailing list?
There doesn't seem to be a great place for KVM user questions, this is it, and kvm-devel seems a poor place for user questions, while the chat room is real time and depends on the question and the answer being in the same place at the same time. Just a thought on getting a dialogue going in the right place. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Time for a linux-kvm mailing list?
Avi Kivity wrote: Bill Davidsen wrote: There doesn't seem to be a great place for KVM user questions, this is it, and kvm-devel seems a poor place for user questions, while the chat room is real time and depends on the question and the answer being in the same place at the same time. kvm-devel is perfectly suitable for user queries. Just a thought on getting a dialogue going in the right place. You could have started by posting your idea on kvm-devel, where kvm developers and users would actually see it. Why would I post it to a list where it's off-topic by list name? And how would anyone know that the list name can be ignored when so many other lists with devel in the name tell people with user questions to go elsewhere? Right now only users who ignore list names would even look there. I was suggesting to improve user participation, since you don't think that's needed I'll stop trying to help. I guess since kvm needs more hardware you have fewer users and don't need user support list like xen. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
Con Kolivas wrote: On Monday 12 March 2007 22:26, Al Boldi wrote: Con Kolivas wrote: On Monday 12 March 2007 15:42, Al Boldi wrote: Con Kolivas wrote: On Monday 12 March 2007 08:52, Con Kolivas wrote: And thank you! I think I know what's going on now. I think each rotation is followed by another rotation before the higher priority task is getting a look in in schedule() to even get quota and add it to the runqueue quota. I'll try a simple change to see if that helps. Patch coming up shortly. Can you try the following patch and see if it helps. There's also one minor preemption logic fix in there that I'm planning on including. Thanks! Applied on top of v0.28 mainline, and there is no difference. What's it look like on your machine? The higher priority one always get 6-7ms whereas the lower priority one runs 6-7ms and then one larger perfectly bound expiration amount. Basically exactly as I'd expect. The higher priority task gets precisely RR_INTERVAL maximum latency whereas the lower priority task gets RR_INTERVAL min and full expiration (according to the virtual deadline) as a maximum. That's exactly how I intend it to work. Yes I realise that the max latency ends up being longer intermittently on the niced task but that's -in my opinion- perfectly fine as a compromise to ensure the nice 0 one always gets low latency. I think, it should be possible to spread this max expiration latency across the rotation, should it not? There is a way that I toyed with of creating maps of slots to use for each different priority, but it broke the O(1) nature of the virtual deadline management. Minimising algorithmic complexity seemed more important to maintain than getting slightly better latency spreads for niced tasks. It also appeared to be less cache friendly in design. I could certainly try and implement it but how much importance are we to place on latency of niced tasks? Are you aware of any usage scenario where latency sensitive tasks are ever significantly niced in the real world? It depends on how you reconcile completely fair and order of magnitude blips in latency. It looks (from the results, not the code) as if nice is implemented by round-robin scheduling followed by once in a while just not giving the CPU to the nice task for a while. Given the smooth nature of the performance otherwise, it's more obvious than if you weren't doing such a good job most of the time. Ugly stands out more on something beautiful! -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: is RSDL an unfair scheduler too?
Con Kolivas wrote: On Saturday 17 March 2007 23:28, Ingo Molnar wrote: * Con Kolivas [EMAIL PROTECTED] wrote: We're obviously disagreeing on what heuristics are [...] that could very well be so - it would be helpful if you could provide your own rough definition for the term, so that we can agree on how to call things? [ in any case, there's no rush here, please reply at your own pace, as your condition allows. I wish you a speedy recovery! ] You're simply cashing in on the deep pipes that do kernel work for other tasks. You know very well that I dropped the TASK_NONINTERACTIVE flag from rsdl which checks that tasks are waiting on pipes and you're exploiting it. Con, i am not 'cashing in' on anything and i'm not 'exploiting' anything. The TASK_NONINTERACTIVE flag is totally irrelevant to my argument because i was not testing the vanilla scheduler, i was testing RSDL. I could have written this test using plain sockets, because i was testing RSDL's claim of not having heuristics, i was not testing the vanilla scheduler. I have simply replied to this claim of yours: Despite the claims to the contrary, RSDL does not have _less_ heuristics, it does not have _any_. [...] and i showed you a workload under _RSDL_ that clearly shows that RSDL is an unfair scheduler too. my whole point was to counter the myth of 'RSDL has no heuristics'. Of course it has heuristics, which results in unfairness. (If it didnt have any heuristics that tilt the balance of scheduling towards sleep-intense tasks then a default Linux desktop would not be usable at all.) so the decision is _not_ a puristic do we want to have heuristics or not, the question is a more practical which heuristics are simpler, which heuristics are more flexible, which heuristics result in better behavior. Ingo Ok but please look at how it appears from my end (illness aside). I spend 3 years just diddling with scheduler code trying my hardest to find a design that fixes a whole swag of problems we still have, and a swag of problems we might get with other fixes. You initially said you were pleased with this design. ..lots of code, testing, bugfixes and good feedback. Then Mike has one testcase that most other users disagree is worthy of being considered a regresssion. You latched onto that and basically called it a showstopper in spite of who knows how many other positive things. Then you quickly produce a counter patch designed to kill off RSDL with a config option for mainline. Then you boldly announce on LKML is RSDL an unfair scheduler too? with some test case you whipped up to try and find fault with the design. No damn it! He's pointing out that you do have heuristics, they are just built into the design. And of course he's whipping up test cases, how else can anyone help you find corner cases where it behaves in an unexpected or undesirable manner? I think he's trying to help, please stop taking it personally. What am I supposed to think? Considering just how many problems I have addressed and tried to correct with RSDL succesfully I'm surprised that despite your enthusiasm for it initially you have spent the rest of the time trying to block it. Please, either help me (and I'm in no shape to code at the moment despite what I have done so far), or say you have no intention of including it. I'm risking paralysis just by sitting at the computer right now so I'm dropping the code as is at the moment and will leave it up to your better judgement as to what to do with it. Actually I think Ingo has tried to help get it in, that's his patch offered for CONFIG_SCHED_FAIR, lets people try it and all. Now for something constructive... by any chance is Mike running KDE instead of GNOME? I only had a short time to play because I had to look at another problem in 2.6.21-rc3 (nbd not working), so the test machine is in use. But it looked as if behavior was not as smooth with KDE. May that thought be useful. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: is RSDL an unfair scheduler too?
schedulers, because I don't believe any one can match the behavior goals of all users. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Pluggable Schedulers (was: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler)
David Lang wrote: On Fri, 9 Mar 2007, Al Boldi wrote: My preferred sphere of operation is the Manichean domain of faster vs. slower, functionality vs. non-functionality, and the like. For me, such design concerns are like the need for a kernel to format pagetables so the x86 MMU decodes what was intended, or for a compiler to emit valid assembly instructions, or for a programmer to write C the compiler won't reject with parse errors. Sure, but I think, even from a technical point of view, competition is a good thing to have. Pluggable schedulers give us this kind of competition, that forces each scheduler to refine or become obsolete. Think evolution. The point Linus is makeing is that with pluggable schedulers there isn't competition between them, the various developer teams would go off in their own direction and any drawbacks to their scheduler could be answered with that's not what we are good at, use a different scheduler, with the very real possibility that a person could get this answer from ALL schedulers, leaving them with nothing good to use. Have you noticed that currently that is exactly what happens? If the default scheduler doesn't handle your load well you have the option of rewriting it and maintaining it, or doing without, or tying to fix your case without breaking others, or patching in some other, non-mainline, scheduler. The default scheduler has been around long enough that I don't see it being tuned for any A without making some B perform worse. Thus multiple schedulers are a possible solution. They don't need to be available as runtime choices, boot time selection would still allow reasonable testing. I can see myself using a compile time option and building multiple kernels, but not the average user. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] VM throttling: avoid blocking occasional writers
Andrew Morton wrote: On Wed, 14 Mar 2007 21:42:46 +0900 Tomoki Sekiyama [EMAIL PROTECTED] wrote: ... -Solution: I consider that all of the dirty pages for the disk have been written back and that the disk is clean if a process cannot write 'write_chunk' pages in balance_dirty_pages(). To avoid using up the free memory with dirty pages by passing blocking, this patchset adds a new threshold named vm.dirty_limit_ratio to sysctl. It modifies balance_dirty_pages() not to block when the amount of Dirty+Writeback is less than vm.dirty_limit_ratio percent of the memory. In the other cases, writers are throttled as current Linux does. In this patchset, vm.dirty_limit_ratio, instead of vm.dirty_ratio, is used as the clamping level of Dirty+Writeback. And, vm.dirty_ratio is used as the level at which a writers will itself start writeback of the dirty pages. Might be a reasonable solution - let's see what Peter comes up with too. Comments on the patch: - Please don't VM_DIRTY_LIMIT_RATIO: just use CTL_UNNUMBERED and leave sysctl.h alone. - The 40% default is already too high. Let's set this new upper limit to 40% and decrease he non-blocking ratio. - Please update the procfs documentation in ./Docmentation/ - I wonder if dirty_limit_ratio is the best name we could choose. vm_dirty_blocking_ratio, perhaps? Dunno. I don't like it, but I dislike it less than dirty_limit_ratio I guess. It would probably break things to change it now, including my sysctl.conf on a number of systems :-( -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: RSDL v0.31
Kasper Sandberg wrote: On Sun, 2007-03-18 at 08:38 +0100, Mike Galbraith wrote: On Sun, 2007-03-18 at 08:22 +0100, Radoslaw Szkodzinski wrote: I'd recon KDE regresses because of kioslaves waiting on a pipe (communication with the app they're doing IO for) and then expiring. That's why splitting IO from an app isn't exactly smart. It should at least be ran in an another thread. Hm. Sounds rather a lot like the... X sucks, fix X and RSDL will rock your world. RSDL is perfect. ...that I've been getting. not really, only X sucks. KDE works atleast as good with rsdl as vanilla. i dont know how originally said kde works worse, wasnt it just someone that thought? It was probably me, and I had the opinion that KDE is not as smooth as GNOME with RSDL. I haven't had time to measure, but using for daily stuff for about an hour each way hasn't changed my opinion. Every once in a while KDE will KLUNK to a halt for 200-300ms doing mundane stuff like redrawing a page, scrolling, etc. I don't see it with GNOME. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: is RSDL an unfair scheduler too?
Bill Huey (hui) wrote: On Sun, Mar 18, 2007 at 06:24:40AM +0100, Willy Tarreau wrote: Dunno. I guess a lot of people would like to then manage the classes, which would be painful as hell. Sure ! I wouldn't like people to point the finger on Linux saying hey look, they can't write a good scheduler so you have to adjust the knobs yourself!. I keep in mind that Solaris' scheduler is very good, both fair and interactive. FreeBSD was good (I haven't tested for a long time). We should manage to get something good for most usages, and optimize later for specific uses. Like I've said in a previous email, SGI schedulers have an interactive term in addition to the normal nice values. If RSDL ends up being too rigid for desktop use, then this might be a good idea to explore in addition to priority manipulation. However, it hasn't been completely proven that RSDL can't handle desktop loads and that needs to be completely explored first. It certain seems like, from the .jpgs that were posted earlier in the thread regarding mysql performance, that RSDL seems to have improved performance for those set ups so it's not universally the case that it sucks for server loads. The cause of this performance difference has yet to be pinpointed. I would say that RSDL is probably a bit better than default for server use, although if the server starves for CPU interactive processing at the console becomes leisurely indeed. The only thing I would like to address is the order of magnitude blips in latency of nice processes, which may be solved by playing with time slices. Con hasn't really commented on that (or I haven't read down to it). Also, bandwidth scheduler like this are a new critical development for things like the -rt patch. It would benefit greatly if the RSDL basic mechanisms (RR and deadlines) were to somehow slip into that patch and be used for a more strict -rt based scheduling class. It would be the basis for first-class control over process resource usage and would be a first in Linux or any mainstream kernel. I don't think that RSDL and -rt should be merged, but that's for Ingo and Con to discuss. I would love to see RSDL in mainline as soon as it is practical, marked as EXPERIMENTAL. This would be a powerful addition to Linux as a whole and RSDL should not be dismissed without these considerations. If it can somehow be integrated into the kernel with interactivity concerns addressed, then it would be an all out win for the kernel in both these areas. I don't think there are a lot of places where it underperforms the default scheduler, and it avoids a lot of jackpot cases where an overloaded system really bogs down. I would like to see more varied testing before any changes are made, unless a simple change would improve consistency of latency. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] [2/6] 2.6.21-rc2: known regressions
Jim Gettys wrote: On Sun, 2007-03-18 at 17:07 +0100, Ingo Molnar wrote: * Pavel Machek [EMAIL PROTECTED] wrote: Some day we may have modesetting support in the kernel for some graphics hw, right now it's pretty damn spotty. Yep, that's the way to go. hey, i wildly supported this approach ever since 1996, when GGI came up :-/ So wildly you wrote tons of code ;-). More seriously, at the time, XFree86 would have spat in your face for any such thing. Thankfully, times are changing. Also more seriously, a somewhat hybrid approach is in order for mode setting: simple mode setting isn't much code and is required for sane behavior on crash (it is nice to get oopses onto a screen); but the full blown mode setting/configuration problem is so large that on some hardware, it is likely left best left to a helper process (not the X server). Also key to get sane behavior out of the scheduler is to get the X server to yield (sleep in the kernel) rather than busy waiting when the GPU is busy; a standardized interface for this for both fbdev and dri is in order. Right now, X is a misbehaving compute bound process rather than the properly interactive process it can/should/will be, releasing the CPU whenever the hardware is busy. Needless to say, this wastes cycles and hurts interactivity with just about any scheduler you can devise. It isn't as if this is hard; on UNIX systems we did it in 1984 or thereabouts. What you say sounds good, assuming that the cost of a sleep is less than the cost of the busy wait. But this may be hardware, the waits may be very small and frequent, and if it's hitting a small hardware window like retrace, delays in response will cause the time period to be missed completely. This probably less critical with very smart cards, many of us don't run them. Of course, in 1996, XFree86 would have ignored any such interfaces, in its insane quest for operating system independent user space drivers requiring no standard kernel interfaces (it is the second part of this where the true insanity lay). - Jim -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] [2/6] 2.6.21-rc2: known regressions
Jim Gettys wrote: On Mon, 2007-03-19 at 16:33 -0400, Bill Davidsen wrote: What you say sounds good, assuming that the cost of a sleep is less than the cost of the busy wait. But this may be hardware, the waits may be very small and frequent, and if it's hitting a small hardware window like retrace, delays in response will cause the time period to be missed completely. This probably less critical with very smart cards, many of us don't run them. Actually, various strategies involving short busy waiting, or looking at DMA address registers before sleeping were commonplace. But a syscall/sleep/wakeup is/was pretty fast. If you have an operation blitting the screen (e.g. scrolling), it takes a bit of time for the GPU to execute the command. I see this right now on OLPC, where a wonderful music application needs to scroll (most of) the screen left), periodically, and we're losing samples sometimes at those operation. None of that conflicts with what I said, but what works on an LCD may not be appropriate for a CRT. With even moderate [EMAIL PROTECTED] timing the horizontal retrace happens ~50k/sec, and that's not an appropriate syscall rate. I'm just pointing out that some things a video interface does with simple hardware involve lots of very small windows. Don't read that as don't do it, just be careful HOW you do it. Remember also, that being nice to everyone else by sleeping, there are more cycles to go around, and the scheduler can nicely boost the X server's priority as it will for interactive processes that are being cooperative. I'm going to cautiously guess that the problem might be not how much but how soon. That is, latency might be more important than giving the server a lot of CPU. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] VM throttling: avoid blocking occasional writers
Tomoki Sekiyama wrote: Hi, Thanks for your comments. I'm sorry for my late reply. Bill Davidsen wrote: Andrew Morton wrote: - I wonder if dirty_limit_ratio is the best name we could choose. vm_dirty_blocking_ratio, perhaps? Dunno. I don't like it, but I dislike it less than dirty_limit_ratio I guess. It would probably break things to change it now, including my sysctl.conf on a number of systems :-( I'm wondering which interface is preferred... 1) Just rename dirty_limit_ratio to dirty_blocking_ratio. Those who had been changing dirty_ratio should additionally modify dirty_blocking_ratio in order to determine the upper limit of dirty pages. 2) Change dirty_ratio to a vector, consists of 2 values; {blocking ratio, writeback starting ratio}. For example, to change the both values: # echo 40 35 /proc/sys/vm/dirty_ratio And to change only the first one: # echo 20 /proc/sys/vm/dirty_ratio In the latter way the writeback starting ratio is regarded as the same as the blocking ratio if the writeback starting ratio is smaller. And then, the kernel behaves similarly as the current kernel. 3) Use dirty_ratio as the blocking ratio. And add start_writeback_ratio, and start writeback at start_writeback_ratio(default:90) * dirty_ratio / 100 [%]. In this way, specifying blocking ratio can be done in the same way as current kernel, but high/low watermark algorithm is enabled. I like 3 better, it should make tuning behavior more precise. You can make an argument for absolute values for writeback, if my disk will only write 70MB/s I may only want 203 sec of pending writes, regardless of available memory. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] VM throttling: avoid blocking occasional writers
Tomoki Sekiyama wrote: Hi, Thanks for your reply. 3) Use dirty_ratio as the blocking ratio. And add start_writeback_ratio, and start writeback at start_writeback_ratio(default:90) * dirty_ratio / 100 [%]. In this way, specifying blocking ratio can be done in the same way as current kernel, but high/low watermark algorithm is enabled. I like 3 better, it should make tuning behavior more precise. Then, what do you think of the following idea? (4) add `dirty_start_writeback_ratio' as percentage of memory, at which a generator of dirty pages itself starts writeback (that is, non-blocking ratio). In this way, `dirty_ratio' is used as the blocking ratio, so we don't need to modify the sysctl.conf etc. I think it's easier to understand for administrators of systems, because the interface is similar as `dirty_background_ratio' and`dirty_ratio.' If this is OK, I'll repost the patch. It sounds good to me, just be sure behavior is sane for for both blocking less than start_writeback and vice versa. You can make an argument for absolute values for writeback, if my disk will only write 70MB/s I may only want 203 sec of pending writes, regardless of available memory. To realize tuning with absolute values, I consider that we need to modify handling of `dirty_background_ratio,' `dirty_ratio' and so on as well as `dirty_start_writeback_ratio.' I think this should be done in another patch if this feature is required. Regards, -- Tomoki Sekiyama Hitachi, Ltd., Systems Development Laboratory -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Question: half-duplex and full-duplex serial driver
Mockern wrote: Hi, Could you help me please, how can my serial driver to work in half-duplex and full-duplex mode? Thank you Since you don't seem to have gotten an answer, and while this is probably the wrong list for your question, I can give you a pointer which may help. The communications program kermit can do this, google for the source, or try kermit.columbia.edu first, and read the source to see how they do it. I'm reasonably sure ioctl() is the answer, but that's choice three for your research. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OT] the shortest thread of LKML !
Willy Tarreau wrote: On Wed, Mar 28, 2007 at 01:02:10PM -0700, David Miller wrote: Please nobody reply to his posting, I'm shit-canning this thread from the start as it's nothing but flame fodder. He forgot the most important thing: there are *many* benevolent dictators, all with their own domain of excellence ;-) Good catch, David, you're like a spider on a web waiting for the naive intruder ! Posted several days too early for April Fool... -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: was once: Samsung DVD writer.
[EMAIL PROTECTED] wrote: Hi, BTW: What happened to FreeBSD User Giacomo and his Samsung DVD writer ? Any news to report ? A happy ending perchance ? Bill Davidsen: [about filtering dangerous SCSI commands] My suggestion would be to add an ioctl, like SET_SCSI_UNFILTERED, which can only be used as root, and which would allow SCSI commands sent a device to be persistently set unfiltered. I understand that would be for programs like firmware updaters, but not for the vanilla purpose of bringing data onto an optical disc ? Because ... if this is intended for daily usage: I do not think that burning a disc on a desktop should necessarily require root privileges. Let's leave it to the sysadmin (or distro) to decide who may burn resp. may endanger the device by malicious use of normally harmless SCSI commands. (Like overworking the drive tray motor ?) I don't see a problem here requiring root. Once the filter is turned off for the device, say in rc.local, and the ownership and permissions are set, there's no issue. It can be owned by root, group cdwriters, have permissions 0660, and cdrecord or other trusted programs can be setgid once the kernel stop blocking such access. After all, what is gained if one performs daily tasks as a privileged user ? That only pierces the protection against absent-minded mistakes and involuntary backdoors. No need to do any daily tasks in a dangerous mode, access would be limited to users and programs regarded as trusted. Actually i try to stay away from any kernel peculiarities so i do not get addicted to something that might change. A pointer to a list of forbidden commands would be welcome thus. If this could be added it would presumably not change. Maybe cdrskin was up to now only tested on totally insecure systems. After all i never got reports of the ominous command filtering interfering with burning. If it prevents any of libburn's SCSI commands from being executed then it does this silently and does not prevent burning success. No, as I noted, programs other than cdrecord are clever enough to avoid requiring disallowed commands. I would like to know, which commands and cease sending them. :)) libburn SCSI command list (commands in brackets are defined but not in normal use): spc.c: 00h, 03h, 12h, 1Eh, 55h, 5Ah, sbc.c: 1Bh, mmc.c: 04h, 23h, 2Ah, 35h, 43h, 46h, 4Ah, 51h, 52h, 53h, 54h, 5Bh, 5Ch, 5Dh, A1h,(AAh),ACh, B6h, BBh,(BEh), Have a nice day :) I put lkml back on the recipients, I'm suggesting a new ioctl as a way around the decision to no longer have setuid/setgid actually fully functional. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21-rc5-git1][KVM] exception on start VM
Avi Kivity wrote: Bill Davidsen wrote: Starting a VM for Win98SE: [ debug snip info ] Known issue. It will be a while before we can support the '95 family on Intel as it makes heavy use of real mode. Thanks for the quick answer, I'll investigate other virtualization solutions. Please copy kvm-devel@lists.sourceforge.net on kvm issues, a per MAINTAINERS. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] remove artificial software max_loop limit
Jan Engelhardt wrote: On Apr 1 2007 11:10, Ken Chen wrote: On 4/1/07, Tomas M [EMAIL PROTECTED] wrote: I believe that IF you _really_ need to preserve the max_loop module parameter, then the parameter should _not_ be ignored, rather it should have the same function like before - to limit the loop driver so if you use max_loop=10 for example, it should not allow loop.c to create more than 10 loops. Blame on the dual meaning of max_loop that it uses currently: to initialize a set of loop devices and as a side effect, it also sets the upper limit. People are complaining about the former constrain, isn't it? Does anyone uses the 2nd meaning of upper limit? Who cares if the user specifies max_loop=8 but still is able to open up /dev/loop8, loop9, etc.? max_loop=X basically meant (at least to me) have at least X loops ready. You have just come up with a really good reason not to do unlimited loops. With the current limit people can count on a script mounting files, or similar, to neither loop for a VERY long time or to eat their memory. Whatever you think of programs without limit checking, this falls in the range of expecting an unsigned char to have a certain upper bound, and argues that the default limit should be the current limit and that setting a lower bound should work as a real and enforced limit. If a new capability is being added, and I think it's a great one, then people using the capability should be the ones explicitly doing something different. Plauger's law of least astonishment. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH resend][CRYPTO]: RSA algorithm patch
Tasos Parisinos wrote: Andi Kleen wrote: Tasos Parisinos [EMAIL PROTECTED] writes: From: Tasos Parisinos [EMAIL PROTECTED] This patch adds module rsa.ko in the kernel (built-in or as a kernel module) and offers an API to do fast modular exponentiation, using the Montgomery algorithm, thus the exponentiation is not generic but can be used only when the modulus is odd, such as RSA public/private key pairs. This module is the computational core (using multiple precision integer arithmetics) and does not provide any means to do key management, implement padding schemas e.t.c. so the calling code should implement all those as needed. Signed-off-by: Tasos Parisinos [EMAIL PROTECTED] You forgot to answer the most important question. What would want to use RSA in the kernel and why? -Andi The main purpose behind the creation of this module was to create the cryptographic infrastructure to develop an in-kernel system of signed modules. I don't really see why this has to be in the kernel, even after reading your text below. This would be code which a tiny number of users would find useful, someone in the future might find exploitable, to perform a function which can be done in user space. The best environment to deploy such functionality is in updating by remote, executable code (programs, libs and modules) on embedded devices running Linux, that have some form of kernel physical security, so one can't tamper the kernel, but can read it. In this case only a public key would be revealed. The vendor of the devices can sign and distribute/update executable code to the devices, and the kernel will not load/run any of them if they don't match with their signatures. The signature can be embedded in the elf, so this system is portable and centralized. Although this functionality can be achieved using userland helper programs this may create the need to physically secure entire filesystems which adds to the cost of developing such devices. So to save cost on your end you want to make this feature part of the mainline kernel. Am I misreading your intent here? In such cases one needs to use asymmetric cryptography because in the case of symmetric it would be very easy to give away the key and end with having all your devices being attacked. Which make a good argument for doing asymmetric anyway, it would seem. That way any updates can be checked off the target machine and validated as authentic. There are already some systems that implement and utilize such functionality that use windows platforms, and other Linux distros that use userland programs to do so, assuming physical security of the host computer. Exactly. Moreover a same system that would use hashes is easier to brake and more difficult to update each time new code must be loaded to the host devices. See also this thread http://lkml.org/lkml/2007/3/19/447 Having said all this, we have a boatload of other crypto in the kernel, if it's just the crypto module, like aes, anubis or micheal_mic, and is GPL compatible, some people may agree. But if this is an embedded system, and you have the patch, why not just apply it to your kernel and forget mainline? -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Lower HD transfer rate with NCQ enabled?
Paa Paa wrote: Q: What conclusion can I make on hdparm -t results or can I make any conclusions? Do I really have lower performance with NCQ or not? If I do, is this because of my HD or because of kernel? What IO scheduler are you using? If AS or CFQ, could you try with deadline? I was using CFQ. I now tried with Deadline and that doesn't seem to degrade the performance at all! With Deadline I got 60MB/s both with and without NCQ. This was with hdparm -t. So what does this tell us? It suggests that it's time to test with real load and see if deadline works well for you in the general case. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Performance Stats: Kernel patch
[EMAIL PROTECTED] wrote: 2) It arrived here with some line-wrapping damage, most likely to the fact that you posted it with Thunderbird. There's a mystic Thunderbird incantation to make it not do that, but I have no idea what it is - it's in the list archives someplace. I don't use TBird (seamonkey fan) but I assume the patch can just be attached rather than inlined. Some mailers are pretty arcane otherwise. But I do like the idea, but the issue of things which parse /proc/PID/status hasn't had comments. A good parser would ignore what it didn't understand, or take everything, not everyone has a good parser. ;-) -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] the overdue removal of X86_SPEEDSTEP_CENTRINO_ACPI
Adrian Bunk wrote: This patch contains the overdue removal of X86_SPEEDSTEP_CENTRINO_ACPI. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] It would be really nice, when removing features used on computers which are only a few years old, if you noted what replaces this functionality. Yes, people can take 10-15 minutes to find and read previous discussion, but one or two sentences who generate less concern and noise on the list. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH resend][CRYPTO]: RSA algorithm patch
Indan Zupancic wrote: On Fri, April 6, 2007 23:30, Bill Davidsen wrote: Tasos Parisinos wrote: The main purpose behind the creation of this module was to create the cryptographic infrastructure to develop an in-kernel system of signed modules. Although this functionality can be achieved using userland helper programs this may create the need to physically secure entire filesystems which adds to the cost of developing such devices. So to save cost on your end you want to make this feature part of the mainline kernel. Am I misreading your intent here? (Tasos was talking about the cost of securing whole file systems versus only the kernel binary.) But if that entire filesystem is initramfs, I don't see any problem. If it fits into the kernel, it also has enough room for an initramfs with a user space program with the RSA signing. I said this before, so please look up how initramfs works and tell us why that isn't sufficient for this case. I suspect your answer will be because it isn't the only part and a lot other infrastructure is need in the kernel to do all the binary signing. But that code you didn't post, only a MPI module, however nice, which is only a partial solution to what you want to achieve. Combine that with the kernel policy to not merge unused code, and you're in the current situation. Having said all this, we have a boatload of other crypto in the kernel, if it's just the crypto module, like aes, anubis or micheal_mic, and is GPL compatible, some people may agree. But if this is an embedded system, and you have the patch, why not just apply it to your kernel and forget mainline? Currently it's less than a cryptoapi module, as it only provide some functions to do multi-precision integer calculations, which happen to be the tricky part of implementing RSA. That said, this implementation seems quite good, from a code size and complexity point of view. So for that alone I think it wouldn't be bad to merge this or a modified version of this, even if it's unused by the rest of the kernel, it might be useful for other users. The burden to carry it along for the kernel is quite small, while the code is worth something and might get improved by their users, in the end having a central place to collect them. So I think from an open source ecological point of view, it wouldn't be bad to merge it. I see three possible way forwards (alternative is the status quo): 1) Move it to user space (into the initramfs embedded into the kernel). But you'd still need to add binary (modules, libs and programs) load hooks. 2) Flesh it out into a ready to use, full blown RSA cryptoAPI module. Whatever you said earlier, whether you want or not, it's just a block cipher, with the modulo as block size (I suspect there's some room for code simplification when assuming fixed block sizes too, by allocating blocksize * 2 space instead of resizing when needed). This would probably be the best solution, to provide most of the hooks while presenting the cryptoAPI for others to use if they wish. Good suggestion. 3) Go all the way, and post all the other kernel modifications too, to get the whole binary signing you want to achieve. Advantage will be that in the end you'll end up with something scrutinized to death. Disadvantage is that it will be scrutinized to death, as that can take a lot of time. Maybe you'll end up with a new LSM module, who knows? The list is in increasing order of difficulty and quality of your end code. It would help if you could find others who also wants something similar and work together to get it into the kernel. But even if the last step fails, you still have had people reviewing your code. And failing even that, you at least shared your code with the rest of the world, which is already something good (and required by the GPL. But doing it in the open is much more laudable than hiding it on a website). Greetings, Indan I think you have covered the possibilities, my read is that your item number two is most likely to be accepted. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] remove artificial software max_loop limit
[EMAIL PROTECTED] wrote: On Fri, 06 Apr 2007 16:33:32 EDT, Bill Davidsen said: Jan Engelhardt wrote: Who cares if the user specifies max_loop=8 but still is able to open up /dev/loop8, loop9, etc.? max_loop=X basically meant (at least to me) have at least X loops ready. You have just come up with a really good reason not to do unlimited loops. That, and I'd expect the intuitive name for have at least N ready to be 'min_loop=N'. 'max_loop=N' means (to me, at least) If I ask for N+1, something has obviously gone very wrong, so please shoot my process before it gets worse. Maybe what's needed is *both* a max_ and min_ parameter? I think that max_loop is a sufficient statement of the highest number of devices needed, and can reasonably interpreted as both I may need this many and I won't legitimately want more. As I recall memory is allocated as the device is set up, so unless you want to use the max memory at boot, just in case, the minimum won't be guaranteed anyway. Something else could eat memory. In practice I think asking for way too many is more common than not being able to get to the max. It may happen but it's a corner case, and status is returned. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: need help
vjn wrote: in my project i want to code the kernel such that when i plugged my usb it should ask for password and check it in the kernel space . can anyone help me I think the correct solution is to use an excrypted mount, and issue the mount command manually with the question in user space. There's no code to ask for input, nor anyway to positively decide which connected terminal is the terminal to ask. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: init's children list is long and slows reaping children.
Ingo Molnar wrote: * Linus Torvalds [EMAIL PROTECTED] wrote: On Fri, 6 Apr 2007, Davide Libenzi wrote: or lets just face it and name it what it is: process_struct ;-) That'd be fine too! Wonder if Linus would swallow a rename patch like that... I don't really see the point. It's not even *true*. A process includes more than the shared signal-handling - it would include files and fs etc too. So it's actually *more* correct to call it the shared signal state than it would be to call it process state. we could call it structure for everything that we know to be ugly about POSIX process semantics ;-) The rest, like files and fs we've abstracted out already. Ingo So are you voting for ugly_struct? ;-) I do think this is still waiting for a more descriptive name, like proc_misc_struct or some such. Kernel code should be treated as literature, intended to be both read and readable. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: REISER4 FOR INCLUSION IN THE LINUX KERNEL.
[EMAIL PROTECTED] wrote: I actually beleive that Hans made a reasonable case that Reiser4 had gone about as far as it could reasonably go with regard to testing, robustness, ... without the broader base of use that even an experimental filesystem in distribution tree would get. Of course, this is an entirely reasonable request of Reiser's. One meet with an array of unreasonable actions, but mainly STALLING which has led to REISER4 never becoming part of the main kernel. It has also lead to many false claims about REISER4. Claims that are never backed up with solid statistics, but used to keep REISER4 out of the kernel and tar its reputation. Keep that last sentence in mind for four lines... I for one would at least play with it if it were in the distribution tree. I AM SURE THERE ARE A HUGE NUMBER OF PEOPLE WHO WOULD GIVE IT A TRY. Claims that are never backed up with solid statistics, ... As far as I could tell Hans pretty much everything else that was demanded. Hans eventually caved and provided - albeit with much pissing and moaning, and holy than thou rhetoric. It was not his pissing and moaning, etc,... these were just excuses to keep REISER4 from succeeding. The truth is, that any excuse would do. The real reasons are financial and backed by big money (sometimes, big egos). Yes all of the people who make millions on the other filesystems! Wait... identify who makes a penny more or less with or without Reiser4. [ snip ] Until Namesys is stable there's no support team. It's not my impression that there's much support otherwise. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: init's children list is long and slows reaping children.
Davide Libenzi wrote: On Mon, 9 Apr 2007, Linus Torvalds wrote: On Mon, 9 Apr 2007, Kyle Moffett wrote: Maybe struct posix_process is more descriptive? struct process_posix? Ugly POSIX process semantics data seems simple enough to stick in a struct name. struct uglyposix_process? Guys, you didn't read my message. It's *not* about process stuff. Anything that tries to call it a process is *by*definition* worse than what it is now. Processes have all the things that we've cleanly separated out for filesystem, VM, SysV semaphore state, namespaces etc. The struct signal_struct is the random *leftovers* from all the other stuff. It's *not* about processes. Never has been, and never will be. I proposed struct task_shared_ctx but you ducked :) Descriptive, correct, I like it! -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Performance Stats: Kernel patch
Eric Dumazet wrote: On Wed, 11 Apr 2007 15:59:16 +0400 Maxim Uvarov [EMAIL PROTECTED] wrote: Patch adds Process Performance Statistics. It make available to the user the following new per-process (thread) performance statistics: * Involuntary Context Switches * Voluntary Context Switches * Number of system calls This data is useful for detecting hyperactivity patterns between processes. Your description is not very clear about the semantic of your stats. You currently returns stats only for thread(s) (not process as you claimed) I'm not sure if you were confused by his use of thread in parenthesis, but isn't the whole point of this to see which threads are doing what? Or am I misreading his result as intentional? Please check kernel/sys.c:k_getrusage() to see how getrusage() has to sum *lot* of individual fields to get precise process numbers (even counting stats for dead threads) -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: init's children list is long and slows reaping children.
Oleg Nesterov wrote: On 04/10, Eric W. Biederman wrote: I'm trying to remember what the story is now. There is a nasty race somewhere with reparenting, a threaded parent setting SIGCHLD to SIGIGN, and non-default signals that results in an zombie that no one can wait for and reap. It requires being reparented twice to trigger. reparent_thread: ... /* If we'd notified the old parent about this child's death, * also notify the new parent. */ if (!traced p-exit_state == EXIT_ZOMBIE p-exit_signal != -1 thread_group_empty(p)) do_notify_parent(p, p-exit_signal); We notified /sbin/init. If it ignores SIGCHLD, we should release the task. We don't do this. The best fix I believe is to cleanup the forget_original_parent/reparent_thread interaction and factor out this exit_state == EXIT_ZOMBIE exit_signal == -1 checks. As long as the original parent is preserved for getppid(). There are programs out there which communicate between the parent and child with signals, and if the original parent dies, it undesirable to have the child getppid() and start sending signals to a program not expecting them. Invites undefined behavior. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add a norecovery option to ext3/4?
Eric Sandeen wrote: Phillip Susi wrote: Eric Sandeen wrote: In that case you are mounting the same filesystem uner 2 different operating systems simultaneously, which is, and always has been, a recipe for disaster. Flagging the fs as mounted already would probably be a better solution, though it's harder than it sounds at first glance. No, it has not been. Prior to poorly behaved journal playback, it was perfectly safe to mount a filesystem read only even if it was mounted read-write by another system ( possibly fsck or defrag ). You might not read the correct data from it, but you would not damage the underlying data simply by mounting it read-only. You might not damage the underlying filesystem, but you could sure go off in the weeds trying to read it, if you stumbled upon some half-updated metadata... so while it may be safe for the filesystem, I'm not convinced that it's safe for the host reading the filesystem. Exactly. If the data are protected you can use other software to access it. For ext3 an explicit ext2 mount might do it... but if you corrupt the underlying information, there's no going back. In practice Linux has had lots of practice mounting garbage, and isn't likely to suffer terminal damage. I wonder what happens if the device is really read-only and the o/s tries to replay the journal as part of a r/o mount? I suspect the system will refuse totally with an i/o error, not what you want. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: init's children list is long and slows reaping children.
Eric W. Biederman wrote: Bill Davidsen [EMAIL PROTECTED] writes: As long as the original parent is preserved for getppid(). There are programs out there which communicate between the parent and child with signals, and if the original parent dies, it undesirable to have the child getppid() and start sending signals to a program not expecting them. Invites undefined behavior. Then the programs are broken. getppid is defined to change if the process is reparented to init. The short answer is that kthreads don't do this so it doesn't matter. But user programs are NOT broken, currently getppid returns either the original parent or init, and a program can identify init. Reparenting to another pid would not be easily noted, and as SUS notes, no values are reserved to error. So there's no way to check, and no neeed for kthreads, I was prematurely paranoid. Presumably user processes will still be reparented to init so that's not an issue. Since there's no atomic signal_parent() the ppid could change between getppid() and signal(), but that's never actually been a problem AFAIK. Related: Is there a benefit from having separate queues for original children of init and reparented (to init) tasks? Even in a server would there be enough to save anything? -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield proposals/rationale
[EMAIL PROTECTED] wrote: -Original Message- Besides - but I guess you're aware of it - any randomized algorithms tend to drive benchmarkers and performance analysts crazy because their performance cannot be repeated. So it's usually better to avoid them unless there is really no alternative. That could already solve your concern from above. Statistically speaking, it will give them (benchmarkers) the smoothest curve they've ever seen. Please be aware that I'm just exploring options/insight here. It is not something I intend to push inside the mainline kernel. I just want to find reasonable and logic criticism as you and some others have provided already. Thanks for that! And having gotten same, are you going to code up what appears to be a solution, based on this feedback? I'm curious how well it would run poorly written programs, having recently worked with a company which seemed to have a whole part of purchasing dedicated to buying same. :-( -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] DualFS: File System with Meta-data and Data Separation
Juan Piernas Canovas wrote: The point of all the above is that you must improve the common case, and manage the worst case correctly. That statement made it to my quote file. Of course correctly hopefully means getting to the desired behavior without a performance hit so bad it becomes a jackpot case and is correct in result but too slow to be useful. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 000 of 6] md: Assorted fixes and features for md for 2.6.21
NeilBrown wrote: Following 6 patches are against 2.6.20 and are suitable for 2.6.21. They are not against -mm because the new plugging makes raid5 not work and so not testable, and there are a few fairly minor intersections between these patches and those patches. There is also a very minor conflict with the hardware-xor patches - one line of context is different. Patch 1 should probably go in -stable - the bug could cause data corruption in a fairly uncommon raid10 configuration, so that one and this intro are Cc:ed to [EMAIL PROTECTED] Thanks, NeilBrown [PATCH 001 of 6] md: Fix raid10 recovery problem. [PATCH 002 of 6] md: RAID6: clean up CPUID and FPU enter/exit code [PATCH 003 of 6] md: Move warning about creating a raid array on partitions of the one device. [PATCH 004 of 6] md: Clean out unplug and other queue function on md shutdown [PATCH 005 of 6] md: Restart a (raid5) reshape that has been aborted due to a read/write error. [PATCH 006 of 6] md: Add support for reshape of a raid6 Every month or so there are a bunch of patches like this, which do various enhancements to the kernel. And these are usually based against the release kernel, and all is fine. But every once in a while there is a patch which is more urgent, in this case the RAID10 one, which is really desirable to get into every kernel running on a machine. Are patches marked as needed for -stable also fast tracked to -git inclusion? If this isn't in -git14 I'm going to rebuild with it before testing Neil's NFS stuff. The NFS server test data is on RAID10 ;-) -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 006 of 6] md: Add support for reshape of a raid6
Andrew Morton wrote: On Tue, 20 Feb 2007 17:35:16 +1100 NeilBrown [EMAIL PROTECTED] wrote: + for (i = conf-raid_disks ; i-- ; ) { That statement should be dragged out, shot, stomped on then ceremonially incinerated. What's wrong with doing for (i = 0; i conf-raid_disks; i++) { in a manner which can be understood without alcoholic fortification? I don't find either hard to read, but you suggestion isn't equivalent, since it increments rather than decrements the index. I admit I probably would write it the same way Neil did... -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc1: framebuffer/console boot failure
Andrew wrote: I have just discovered 2.6.21-rc1 boots with pci=noacpi ... Try setting the resolution and frame rate, video=XXX:[EMAIL PROTECTED] or such. Worked for me. I like pci=noacpi, though ;-) -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUESTION] Sata RAID
Patrick Ale wrote: On 2/24/07, Patrick Ale [EMAIL PROTECTED] wrote: On 2/24/07, Michael-Luke Jones [EMAIL PROTECTED] wrote: One more question regarding this, I am aware its not *really* kernel related but answering this question now will save yourself a lot of bogus emails from me about MD oopses later and all, and I want to setup my disks right once and for all and never witness what I witnessed last weeks with my ATA disks. Would you use MD at all, taking in account the disks come from the same batch and all? I hear these things about MD/RAID being pointless when you use disks from the same brand/type/batch since they most likely will break shortly after each other. Well, for values of shortly in months in most cases. These are consumer goods, I would not expect units with consecutive serial numbers to fail separated by such a short time that you can't do a backup and/or replace and rebuild. If quality control were so good they are likely to fail at the same time it would be so good they would be obsolete before they failed. That urban myth is a good reason to do backups, but a bad reason to avoid RAID. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench
Paulo Marques wrote: Rik van Riel wrote: J.A. Magallón wrote: [...] Its the same to answer 4+4 queries than 8 at half the speed, isn't it ? That still doesn't fix the potential Linux problem that this benchmark identified. To clarify: I don't care as much about MySQL performance as I care about identifying and fixing this potential bug in Linux. IIRC a long time ago there was a change in the scheduler to prevent a low prio task running on a sibling of a hyperthreaded processor to slow down a higher prio task on another sibling of the same processor. Basically the scheduler would put the low prio task to sleep during an adequate task slice to allow the other sibling to run at full speed for a while. I don't know the scheduler code well enough, but comments like this one make me think that the change is still in place: /* * If an SMT sibling task has been put to sleep for priority * reasons reschedule the idle task to see if it can now run. */ if (rq-nr_running) { resched_task(rq-idle); ret = 1; } If that is the case, turning off CONFIG_SCHED_SMT would solve the problem. That may be the case, but in my opinion if this helps it doesn't solve the problem, because the real problem is that a process which is not on a HT is being treated as if it were. Note that Intel does make multicore HT processors, and hopefully when this code works as intended it will result in more total throughput. My supposition is that it currently is NOT working as intended, since disabling SMT scheduling is reported to help. A test with MC on and SMT off would be informative for where to look next. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: latencies due to disk writes
[EMAIL PROTECTED] wrote: Hello! I'm experiencing extreme lags during disk writes. I have read somewhere (didn't save the URI, sigh) that this is actually related to bad (non-existing) write io priorities (CFQ only manages file reads). I could imagine two quick, easy and probably quite effective ways to prevent such lags: 1.) don't flush buffers to disk at once more than necessary. Actually, in many cases this is just what you do want, to avoid filling memory with buffered writes and then flushing them on time or memory runout. Investigate the /proc/sys/vm/dirty_* values. 2.) relate CPU niceness to max write buffer fill level (ie. the point where it gets forced to be flushed to disk -- a conservative estimate would be much better than nothing): (100-5*nicelevel)%, ie. writes for processes having nice level 19 are blocked/delayed until the write buffer is below 5%. That way, the accounting is done at a higher and probably easier to access level. Maybe I'm just talking nonsense, but nonetheless, here are my 2 cents. Best regards, Mark p.s. please CC me as I'm not subscribed to this list. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc1: CIFS cheers, NFS4 jeers
Florin Iucha wrote: Hello, it's me and my 70 GB of photos again. I have tested both CIFS and NFSv4 clients in kernel 2.6.20-rc1 . CIFS passed with flying colors and NFSv4 stalled after 7 GB. Configuration: Server: PIII/1GHz, 512 MB RAM, Debian testing, distro kernel 2.6.18-3-vserver-686, Intel E1000 NIC, filesystem 170 GB ext3 with default mkfs values on a SATA disk Client: AMD x2 4200+, 2 GB RAM, Debian testing/unstable kernel 2.6.20-rc1, Marvell SKGE onboard, filesystem 120 GB ext3 with default mkfs values on a SATA disk Neil has been diddling NFS, I did some light testing with 2.6.20-git14 with 190GB of mp3 and mpg files (library of congress folk music) without hangs. Just did it work tests, copy 20-30GB to server, do md5 on the data pulled back from the server. Didn't hang, performance testing later. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bug in kernel 2.6.21-rc1-git1: conventional floppy drive cannot be mounted without hanging up the whole system
Linus Torvalds wrote: On Mon, 26 Feb 2007, Rene Herman wrote: Other than these two, ECP parallel ports are the other remaining users. Now, even though on a machine that still has a parallel port it might usually indeed be set to ECP in its BIOS; having anything attached to the port also use it as such seems quite seldom. Well, if it's some kind of cache coherency problem (the same way much more modern CPU's have cache coherency issues with DMA during C3 sleep), then it's entirely possible that the normal ECP parallel port behaviour would never show it, since most people tend to use it for output only (yeah, I realize you can use it bidirectionally, but at least on old hardware it tends to be talk AT printer rather than talk WITH printer. The bidirectional use is/was PL/IP, aka laplink connections. Yes, I still have a machine I installed that way, and it will run 2.2.19 forever before I try it again. ;-) I frankly forget what hardware platforms had problems with the DMA thing, and what the exact behaviour even was (I think there was some possibility of corrupt data on the floppy). We also used to have the nohlt flag to turn off hlt entirely, and that was due to some other legacy issues, iirc. I seriously doubt we will ever see anybody who has this problem ever again, but on the other hand, I also seriously doubt that most modern machines even *have* a floppy drive any more, so I'd rather not even change it. It's just not worth even a miniscule risk.. Thank you. Linus -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc1: CIFS cheers, NFS4 jeers
Florin Iucha wrote: On Tue, Feb 27, 2007 at 09:36:23PM -0500, Bill Davidsen wrote: Florin Iucha wrote: Hello, it's me and my 70 GB of photos again. I have tested both CIFS and NFSv4 clients in kernel 2.6.20-rc1 . CIFS passed with flying colors and NFSv4 stalled after 7 GB. Neil has been diddling NFS, I did some light testing with 2.6.20-git14 with 190GB of mp3 and mpg files (library of congress folk music) without hangs. Just did it work tests, copy 20-30GB to server, do md5 on the data pulled back from the server. Didn't hang, performance testing later. 2.6.20-rcX used to copy all files then hang on certain operations that I think used the VFS. 2.6.21-rc1 stalls the NFS transfer itself after several GB. The data was never corrupted. Have you tried copying _ALL_ 190 GB to the server? No, but as noted I was doing 20-30GB, so if several is a small number I'm not seeing that behavior. I'm using a Gbit connection if that is not the same as your setup. I have additional testing queued for time available this week. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] affinity is not defined in non-smp kernels - i386
Eric W. Biederman wrote: Fernando Luis Vázquez Cao [EMAIL PROTECTED] writes: Initialize affinity only when building SMP kernels. Reasonable. I goofed here. However I would prefer my patch that just deletes these problem lines. These lines don't really contribute anything and are harmless to remove. Where is the initialization performed, then? -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] md: Fix for raid6 reshape.
Neil Brown wrote: On Thursday March 1, [EMAIL PROTECTED] wrote: On Fri, 2 Mar 2007 15:56:55 +1100 NeilBrown [EMAIL PROTECTED] wrote: - conf-expand_progress = (sector_nr + i)*(conf-raid_disks-1); + conf-expand_progress = (sector_nr + i) * new_data_disks); ahem. It wasn't like that when I tested it, honest... But the original got caught up with some other changes which were not really related so I removed them all and just made this change manually and totally messed it up (again). Sorry. Of course it should be + conf-expand_progress = (sector_nr + i) * new_data_disks; Will the (real) fix be in 2.6.21? -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Question: 20 microseconds delay
Mockern wrote: The problem is that I need to use wait_event_timeout function AFAIK you can't do that, it's just not the right tool for the job. You can always use a single jiffy and be sure you will wait at least 20us, or use usleep. No matter how much you need to use your snowblower, it won't mow your lawn. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bugs in kernel 2.6.21 (both two release candidates) and kernel 2.6.20
Uwe Bugla wrote: Hi folks, the floppy mount error I mentioned is gone now in 2.6.21-rc2, and my kernel is smaller. Good decision to rip out Stephane's stuff, Linus! As I did not get a reply from Andrew I hope that the buggy stuff residing in 2.6.20-mm1 ( freezing my apache services - I already mentioned the problem some days ago - mm2 I did not try yet ) will never be pushed into vanilla mainline. I owe some old CDROM and CDRW devices manufactured by TEAC (bought somewhen in 1999): CDR 540 and CDRW 54. Those old CD devices sometimes get confused with drive seek errors and status errors shown in dmesg. The newer DVD devices (LG reading device and Yamakawa burning device) do not show those errors at all. As I have finished an enourmous project 6 weeks ago (transforming some 500 Audio CDs to MP3 format with kaudiocreator and lame 3.97 (320 kbit quality - preset insane) and then burning the material on DVDs) those old devices were an incredible help in some cases where the newer DVD devices refused to read some audio CDs without errors. That's why I do not want to kick them off at all. Never had those troubles with kernel 2.6.19 and former ones. Other than wanting to stay current, is there a reason why you need to go to a newer kernel to do this process? Would it be an option just to run on a kernel which works for the moment? Assuming that you don't want to use these drives unless you can't read a CD any other way, would it be practical to (a) move these drives to another machine and run and old kernel, (b) try a newer CD (not DVD) reader, or (c) install one or both of these antiques in a USB external enclosure which would allow you to reinsert the drive rather than reboot? It may take a while for this problem to be identified, I doubt there are many around for developers to test. If I have one in the old junk closet anyone is welcome to it, but I have donated a lot of built from parts machines to various people and causes, so anything that old is unlikely to be found. Dmesg 1 says on my AMD machine with a CDR540 as /dev/hdd during boot process: hdd: media error (bad sector): status=0x51 { DriveReady SeekComplete Error } hdd: media error (bad sector): error=0x34 { AbortedCommand LastFailedSense=0x03 } ide: failed opcode was: unknown ATAPI device hdd: Error: Medium error -- (Sense key=0x03) (reserved error code) -- (asc=0x02, ascq=0x00) The failed Read 10 packet command was: 28 00 00 00 00 10 00 00 02 00 00 00 00 00 00 00 end_request: I/O error, dev hdd, sector 64 Buffer I/O error on device hdd, logical block 8 hdd: media error (bad sector): status=0x51 { DriveReady SeekComplete Error } hdd: media error (bad sector): error=0x34 { AbortedCommand LastFailedSense=0x03 } ide: failed opcode was: unknown ATAPI device hdd: Error: Medium error -- (Sense key=0x03) (reserved error code) -- (asc=0x02, ascq=0x00) The failed Read 10 packet command was: 28 00 00 00 00 10 00 00 02 00 00 00 00 00 00 00 end_request: I/O error, dev hdd, sector 64 Buffer I/O error on device hdd, logical block 8 But even more crucial is this one: Dmesg 2 says on the Intel machine with a TEAC CDRW54 as /dev/hdd: hdd: status error: status=0x7f { DriveReady DeviceFault SeekComplete DataRequest CorrectedError Index Error } hdd: status error: error=0x7f { IllegalLengthIndication EndOfMedia AbortedCommand MediaChangeRequested LastFailedSense=0x07 } ide: failed opcode was: unknown For about 1 second the whole system hangs while /dev/hdd is executing some kind of reinitialization, just like as if you unconnect the data and the 12 V / 6V cable and reconnect them again while the machine is up and running. For a DVB-S record f. ex. the breakdown of the recording can be one consequence. Question: Can someone reading this please confirm these errors? Please take old CD devices to find out, not newer ones or even DVD devices! I am using the standard IDE driver with the following chipsets: Intel ICH4 and SIS 5513. And please take time, as these crucial errors do not happen immediately, but about 4 times in about 8 - 10 hours while the machine is up and running. Yours sincerely and thanks for all your efforts Uwe P. S.: I do not think this is a hardware error as I did not have those problems with kernels = 2.6.19. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] libata: warn if speed limited due to 40-wire cable (v2)
Alan Cox wrote: it seems broken to manipulate xfer_mask after returning from the driver's -mode_filter hook. this patch is more than just a speed-limited warning printk, afaics I actually suggested that order because the only way the printk can be done correctly is for it to be the very last test made. Since the mode filter is not told what mode will be used but just subtracts modes that are not allowed this should be safe. Far better to have a drive which works slowly than one which works unreliably. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] libata: warn if speed limited due to 40-wire cable (v2)
Stephen Clark wrote: Bill Davidsen wrote: Alan Cox wrote: it seems broken to manipulate xfer_mask after returning from the driver's -mode_filter hook. this patch is more than just a speed-limited warning printk, afaics I actually suggested that order because the only way the printk can be done correctly is for it to be the very last test made. Since the mode filter is not told what mode will be used but just subtracts modes that are not allowed this should be safe. Far better to have a drive which works slowly than one which works unreliably. That would be true if the 40 wire detection was 100% accurate! The statement is completely correct, even though the detection may not be. ;-) With the current set(s) of patches to do better detection, cable evaluation should be better. But even if not, a slow system is more useful than one which doesn't work, crashes because of swap i/o errors, etc. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc2-mm1
Neil Brown wrote: On Sunday March 4, [EMAIL PROTECTED] wrote: On Mon, 5 Mar 2007 01:11:33 +0100 J.A. Magallón [EMAIL PROTECTED] wrote: On Fri, 2 Mar 2007 03:00:26 -0800, Andrew Morton [EMAIL PROTECTED] wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.21-rc2-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc2/2.6.21-rc2-mm1/ nfs blocks shutdown and reboot. If I try to do 'service nfs stop', the box hangs, no login, no SysRQ-T or P, S-U-B works at least. The bug was added by knfsd-use-recv_msg-to-get-peer-address-for-nfsd-instead-of-code-copying.patch. Bother... Looks like a need a MSG_DONTWAIT in there, don't I. I'll resend. Crap, that's probably in 2.6.20-git14 with all the NFS stuff I thought I had checked before trusting. At least there is a bug found, no more works for me reports. I'll revert to an FC6 kernel before load gets high this morning. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] disable NMI watchdog by default
Ingo Molnar wrote: From: Ingo Molnar [EMAIL PROTECTED] Subject: [patch] disable NMI watchdog by default there's a new NMI watchdog related problem: KVM crashes on certain bzImages because ... we enable the NMI watchdog by default (even if the user does not ask for it) , and no other OS on this planet does that so KVM doesnt have emulation for that yet. So KVM injects a #GP, which crashes the Linux guest: general protection fault: [#1] PREEMPT SMP Modules linked in: CPU:0 EIP:0060:[c011a8ae]Not tainted VLI EFLAGS: 0246 (2.6.20-rc5-rt0 #3) EIP is at setup_apic_nmi_watchdog+0x26d/0x3d3 and no, i did /not/ request an nmi_watchdog on the boot command line! Solution: turn off that darn thing! It's a debug tool, not a 'make life harder' tool!! with this patch the KVM guest boots up just fine. And with this my laptop (Lenovo T60) also stops its sporadic hard hanging (sometimes in acpi_init(), sometimes later during bootup, sometimes much later during actual use) as well. It hung with both nmi_watchdog=1 and nmi_watchdog=2, so it's generally the fact of NMI injection that is causing problems, not the NMI watchdog variant, nor any particular bootup code. The patch is unintrusive. I'm missing something, what limits this to systems running under kvm? -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [5/6] 2.6.21-rc2: known regressions
Ingo Molnar wrote: * Adrian Bunk [EMAIL PROTECTED] wrote: Subject: i386: no boot with nmi_watchdog=1 (clockevents) References : http://lkml.org/lkml/2007/2/21/208 Submitter : Daniel Walker [EMAIL PROTECTED] Caused-By : Thomas Gleixner [EMAIL PROTECTED] commit e9e2cdb412412326c4827fc78ba27f410d837e6e Handled-By : Thomas Gleixner [EMAIL PROTECTED] Status : problem is being debugged FYI, this is not a wont boot problem, this should be a NMI watchdog does not work problem - which has far lower severity. Also, Thomas did a fix for this which is now in -mm. If a system normally runs a watchdog, and some do, then nmi would be forced on by grub.comf and the system would not boot. And if the system was counting on nmi to look for a hanging problem, nmi does not work would be a real problem if the failure was silent. Actually, a lack of nmi would be worse than not booting, it would be a time bomb waiting for a bad moment to hang. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
jos poortvliet wrote: Op Sunday 04 March 2007, schreef Willy Tarreau: Hi Con ! This was designed to be robust for any application since linux demands a general purpose scheduler design, while preserving interactivity, instead of optimising for one particular end use. Well, I haven't tested it yet, but your design choices please me. As you know, I've been one of those encountering big starvation problems with the original scheduler, making 2.6 unusable for me in many situations. I welcome your work and want to thank you for the time you spend trying to fix it. Keep up the good work, Willy PS: I've looked at your graphs, I hope you're on the way to something really better than the 21 first 2.6 releases ! Well, imho his current staircase scheduler already does a better job compared to mainline, but it won't make it in (or at least, it's not likely). So we can hope this WILL make it into mainline, but I wouldn't count on it. Wrong problem, what is really needed is to get CPU scheduler choice into mainline, just as i/o scheduler finally did. Con has noted that for some loads this will present suboptimal performance, as will his -ck patches, as will the default scheduler. Instead of trying to make ANY one size fit all, we should have a means to select, at runtime, between any of the schedulers, and preferably to define an interface by which a user can insert a new scheduler in the kernel (compile in, I don't mean plugable) with clear and well defined rules for how that can be done. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] libata: warn if speed limited due to 40-wire cable (v2)
Stephen Clark wrote: Bill Davidsen wrote: Stephen Clark wrote: Bill Davidsen wrote: Alan Cox wrote: it seems broken to manipulate xfer_mask after returning from the driver's -mode_filter hook. this patch is more than just a speed-limited warning printk, afaics I actually suggested that order because the only way the printk can be done correctly is for it to be the very last test made. Since the mode filter is not told what mode will be used but just subtracts modes that are not allowed this should be safe. Far better to have a drive which works slowly than one which works unreliably. That would be true if the 40 wire detection was 100% accurate! The statement is completely correct, even though the detection may not be. ;-) With the current set(s) of patches to do better detection, cable evaluation should be better. But even if not, a slow system is more useful than one which doesn't work, crashes because of swap i/o errors, etc. I have had problems with cable detection on my previous laptop and my current laptop. It almost made my systems unusable. On my current laptop I was getting a thruput of a little over 1 mbps instead of the 44 mbps I get with udma set to the correct value. It took hours to upgrade my laptop from fc5 to fc6 because of this mis detection. As far as I can see, if you are getting that low a speed, you have other problems. I have a system with old slow drives which are really on a 40 pin cable, and they run at UDMA(33). One of the experts in this can undoubtedly tell us more, but your system should run faster than that, mine does, and I really HAVE a 40 pin cable (and drive). If your system drops to PIO modes, I doubt cable is the only issue, I think there are other issues (acpi comes to mind). -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Raid 10 Problems?
Marc Perkel wrote: --- Jan Engelhardt [EMAIL PROTECTED] wrote: On Mar 4 2007 19:17, Marc Perkel wrote: Thanks - because of your suggestion I had found the instructions. But you have some interesting options set. -N nicearray -b internal -e 1.0 Are these important? -N? What's in a name? I suppose, it's not so important. (Arrays are identified by their UUID anyway. But maybe udev can do something with the name someday as it does today with /dev/disk/*.) Not worth starting over for. -b internal -- seems like a good idea to speed up resynchronization. I'm trying to figure out what the default is. -e 1.0 -- I wonder why the new superblock format is not default in mdadm (0.90 is still used). Looks interesting for big arrays but not sure it's worth starting over for. Trying to get through a 2 hour sync using 4 500gb sata 2 drives. That's exactly why you want the bitmap. Fortunately you can add it after the array is created. Now the bad news, you should read and understand the meaning of the far layout. Part of the information is in the mdadm man page under -p, some in the md man page. Use of far layout will effect the performance of the array, the balance of read vs. write performance, and (maybe) the reliability. Two hours is a pretty short time to invest if you find that you have your layout wrong and would be better off for the life of the array with some other data layout. And the time to do the reading is worth if if you wind up convinced that the default settings are fine for you. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] BadRAM still not ready for inclusion ? (was: Re: Free Linux Driver Development!)
imagine that such \ feature would be possible ? Where is those efforts for fixing/integrating fantastic cowloop? What about badram/badmem patch ? Compressed Ccaching ? Somebody helping with development of dm-loop or extend loop.c to support more than \ 256 devices ? Replacement of proprietary, unstable and unelegant vmware-lopp for \ being able to mount vmware .vmdk files ? Internal Spec for this is open, dm-userspace \ could be some infrastructure for this, but the author seems to have other \ priorities dm-cow, zfs-fuse - anybody ? Kernel based target for AoE (Ata over Ethernet) ? (there are two independent \ implementations, but both got stuck at some early experimental stage) Just my 2 cents. Roland K. Sysadmin ps: This isn`t meant to criticise any of you kernel developers since you`re doing \ fantastic work! ___ Viren-Scan für Ihren PC! Jetzt für jeden. Sofort, online und kostenlos. Gleich testen! http://www.pc-sicherheit.web.de/freescan/?mc=02 -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
CONFIG_SYSFS_DEPRECATED and similar issues
Just a few comments on changes like this in general. Prompted by, but not otherwise related to the subject. When any new feature in the kernel requires significant changes in userspace, attention should be given to documenting the user items which must change. Now attention doesn't mean handwaving b.s. like user programs which depend on sysfs will need to be modified. It means a list of which common user features need to be updated and where to find new ones. That's what the whole Documentation directory is for, but it's rarely used for such a purpose. In the most recent case, a user tool is needed which doesn't even exist as a release. The other issue is to avoid trap door changes, which occur when a kernel change requires new user tools, and the user tools will not run with older kernels. That includes missing major features like having a display and being able to mount filesystems, even if the kernel is technically running. When there were stable and development kernel series that was not so much of an issue, now that every kernel is an adventure it would be nice to ensure that testing a new kernel is not an irrevocable step. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.22 patch] the scheduled removal of OBSOLETE_OSS options
Adrian Bunk wrote: This patch contains the scheduled removal of the OBSOLETE_OSS options for 2.6.22. If these are drivers for which there are thought to be useful ALSA drivers, would it be reasonable to leave a stub for a help file naming the driver which claims to support the hardware? I'm not objection to the removal of the drivers, just noting that identifying the new drivers can be made easier. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
Gene Heskett wrote: On Monday 05 March 2007, Nicolas Mailhot wrote: This looks like -mm stuff if you want it in 2.6.22 This needs to get to 2.6.21, it really is that big an improvement. As Con pointed out, for some workloads and desired behavour this is not as good as the existing scheduler. Therefore it should go in -mm and hopefully give the user an option to select which is appropriate. With luck I'll get to shake out that patch in combination with kvm later today. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
Willy Tarreau wrote: On Tue, Mar 06, 2007 at 11:18:44AM +1100, Con Kolivas wrote: On Tuesday 06 March 2007 10:05, Bill Davidsen wrote: jos poortvliet wrote: Well, imho his current staircase scheduler already does a better job compared to mainline, but it won't make it in (or at least, it's not likely). So we can hope this WILL make it into mainline, but I wouldn't count on it. Wrong problem, what is really needed is to get CPU scheduler choice into mainline, just as i/o scheduler finally did. Con has noted that for some loads this will present suboptimal performance, as will his -ck patches, as will the default scheduler. Instead of trying to make ANY one size fit all, we should have a means to select, at runtime, between any of the schedulers, and preferably to define an interface by which a user can insert a new scheduler in the kernel (compile in, I don't mean plugable) with clear and well defined rules for how that can be done. Been there, done that. Wli wrote the infrastructure for plugsched; I took his code and got it booting and ported 3 or so different scheduler designs. It allowed you to build as few or as many different schedulers into the kernel and either boot the only one you built into your kernel, or choose a scheduler at boot time. That code got permavetoed by both Ingo and Linus. After that I gave up on that code and handed it over to Peter Williams who still maintains it. So please note that I pushed the plugsched barrow previously and still don't think it's a bad idea, but the maintainers think it's the wrong approach. In a way, I think they are right. Let me explain. Pluggable schedulers are useful when you want to switch away from the default one. This is very useful during development of a new scheduler, as well as when you're not satisfied with the default scheduler. Having this feature will incitate many people to develop their own scheduler for their very specific workload, and nothing generic. It's a bit what happened after all : you, Peter, Nick, and Mike have worked a lot trying to provide alternative solutions. But when you think about it, there are other OSes which have only one scheduler and which behave very well with tens of thousands of tasks and scale very well with lots of CPUs (eg: solaris). So there is a real challenge here to try to provide something at least as good and universal because we know that it can exist. And this is what you finally did : work on a scheduler which ought to be good with any workload. The problem is not with any workload, because that's not the issue, the issue is the definition of good matching the administrator's policy. And that's where the problem comes in. We have the default scheduler, which favors interactive jobs. We have Con's staircase scheduler which is part of an interactivity package. We have the absolutely fair scheduler which is, well... fair, and keeps things smooth and under reasonable load crisp. There are other schedulers in the pluggable package, I did a doorknob scheduler for 2.2 (everybody gets a turn, special case of round-robin). I'm sure people have quietly hacked many more, which have never been presented to public view. The point is that no one CPU scheduler will satisfy the policy needs of all users, any more than one i/o scheduler does so. We have realtime scheduling, preempt both voluntary and involuntary, why should we not have multiple CPU schedulers. If Linus has an objection to plugable schedulers, then let's identify what the problem is and address it. If that means one scheduler or the other must be compiled in, or all compiled in and selected, so be it. Then, when we have a generic, good enough scheduler for most situations, I think that it could be good to get the plugsched for very specific usages. People working in HPC may prefer to allocate ressource differently for instance. There may also be people refusing to mix tasks from different users on two different siblings of one CPU for security reasons, etc... All those would justify a plugable scheduler. But it should not be an excuse to provide a set of bad schedulers and no good one. Unless you force the the definition of good to what the default scheduler does, there can be no one good one. Choice is good, no one is calling for bizarre niche implementations, but we have at minimum three CPU schedulers which as best for a large number of users. (current default, and Con's fair and interactive flavors, before you ask). The CPU scheduler is often compared to the I/O schedulers while in fact this is a completely different story. The I/O schedulers are needed because the hardware and filesystems may lead to very different behaviours, and the workload may vary a lot (eg: news server, ftp server, cache, desktop, real time streaming, ...). But at least, the default I/O scheduler was good enough for most usages, and alternative ones are here to provide optimal solutions
Re: [2.6.22 patch] the scheduled removal of OBSOLETE_OSS options
Adrian Bunk wrote: On Tue, Mar 06, 2007 at 12:46:22PM -0500, Bill Davidsen wrote: Adrian Bunk wrote: This patch contains the scheduled removal of the OBSOLETE_OSS options for 2.6.22. If these are drivers for which there are thought to be useful ALSA drivers, would it be reasonable to leave a stub for a help file naming the driver which claims to support the hardware? I'm not objection to the removal of the drivers, just noting that identifying the new drivers can be made easier. People compiling their own kernels aren't completely dumb - if you know about people having problems finding the right ALSA driver for their hardware, please name the concrete problems so that we can improve the description and/or help text of these ALSA options. I'm not sure how my original note might have been clearer, but let me try again. You are about to delete a number of OSS drivers because there are ALSA drivers for the hardware. I am assuming that for each of there drivers you have some ALSA driver in mind, rather than just just general handwaving. I therefore suggest that it would be good if one person, that would be you, could do a little Kconfig magic so that when 'make oldconfig' on new kernel source fails to support sound, there might be a message in the output with a hint, like 'OSS driver has been deleted and ALSA driver should support this hardware.' So one person who I bet knows which replacement drivers are most likely could save some effort for many people who otherwise may have to read help on a number of drivers (naming is not always obvious), or grep through the driver source for board or chipset names giving a clue. If Kconfig can't do this, fine, I haven't studied it in years, nor ever been an expert. If you have no idea what drivers replace the ones you are deleting and are only following orders, fine too (but I doubt that). But no improvement to ALSA help text would save as many people as much time as a one line message telling them the most likely driver to support similar hardware and avoiding the need to look at that text, or at least let the cautious look as the most likely text first. Since you are the agent of change in breaking many existing configs I thought you might be inclined to at least give a clue if it were small effort on your part. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL v0.31
David Schwartz wrote: there were multiple attempts with renicing X under the vanilla scheduler, and they were utter failures most of the time. _More_ people complained about interactivity issues _after_ X has been reniced to -5 (or -10) than people complained about nice 0 interactivity issues to begin with. Unfortunately, nicing X is not going to work. It causes X to pre-empt any local process that tries to batch requests to it, defeating the batching. What you really want is X to get scheduled after the client pauses in sending data to it or has sent more than a certain amount. It seems kind of crazy to put such login in a scheduler. Perhaps when one process unblocks another, you put that other process at the head of the run queue but don't pre-empt the currently running process. That way, the process can continue to batch requests, but X's maximum latency delay will be the quantum of the client program. In general I think that's the right idea. See below for more... The vanilla scheduler's auto-nice feature rewards _behavior_, so it gets X right most of the time. The fundamental issue is that sometimes X is very interactive - we boost it then, there's lots of scheduling but nice low latencies. Sometimes it's a hog - we penalize it then and things start to batch up more and we get out of the overload situation faster. That's the case even if all you care about is desktop performance. no doubt it's hard to get the auto-nice thing right, but one thing is clear: currently RSDL causes problems in areas that worked well in the vanilla scheduler for a long time, so RSDL needs to improve. RSDL should not lure itself into the false promise of 'just renice X statically'. It wont work. (You might want to rewrite X's request scheduling - but if so then i'd like to see that being done _first_, because i just dont trust such 10-mile-distance problem analysis.) I am hopeful that there exists a heuristic that both improves this problem and is also inherently fair. If that's true, then such a heuristic can be added to RSDL without damaging its properties and without requiring any special settings. Perhaps longer-term latency benefits to processes that have yielded in the past? I think there are certain circumstances, however, where it is inherently reasonable to insist that 'nice' be used. If you want a CPU-starved task to get more than 1/X of the CPU, where X is the number of CPU-starved tasks, you should have to ask for that. If you want one CPU-starved task to get better latency than other CPU-starved tasks, you should have to ask for that. I agree for giving a process more than a fair share, but I don't think latency is the best term for what you describe later. If you think of latency as the time between a process unblocking and the time when it gets CPU, that is a more traditional interpretation. I'm not really sure latency and CPU-starved are compatible. I would like to see processes at the head of the queue (for latency) which were blocked for long term events, keyboard input, network input, mouse input, etc. Then processes blocked for short term events like disk, then processes which exhausted their time slice. This helps latency and responsiveness, while keeping all processes running. A variation is to give those processes at the head of the queue short Fundamentally, the scheduler cannot do it by itself. You can create cases where the load is precisely identical and one person wants X and another person wants Y. The scheduler cannot know what's important to you. DS -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL v0.31
Linus Torvalds wrote: On Tue, 20 Mar 2007, Willy Tarreau wrote: Linus, you're unfair with Con. He initially was on this position, and lately worked with Mike by proposing changes to try to improve his X responsiveness. I was not actually so much speaking about Con, as about a lot of the tone in general here. And yes, it's not been entirely black and white. I was very happy to see the try this patch email from Al Boldi - not because I think that patch per se was necessarily the right fix (I have no idea), but simply because I think that's the kind of mindset we need to have. Not a lot of people really *like* the old scheduler, but it's been tweaked over the years to try to avoid some nasty behaviour. I'm really hoping that RSDL would be a lot better (and by all accounts it has the potential for that), but I think it's totally naïve to expect that it won't need some tweaking too. So I'll happily still merge RSDL right after 2.6.21 (and it won't even be a config option - if we want to make it good, we need to make sure *everybody* tests it), but what I want to see is that can do spirit wrt tweaking for issues that come up. May I suggest that if you want proper testing that it not only should be a config option but a boot time option as well? Otherwise people will be comparing an old scheduler with an RSDL kernel, and they will diverge as time goes on. More people would be willing to reboot and test on a similar load than will keep two versions of the kernel around. And if you get people testing RSDL against a vendor kernel which might be hacked, it will be even less meaningful. Please consider the benefits of making RSDL the default scheduler, and leaving people with the old scheduler with an otherwise identical kernel as a fair and meaningful comparison. There, that's a technical argument ;-) Because let's face it - nothing is ever perfect. Even a really nice conceptual idea always ends up hitting the but in real life, things are ugly and complex, and we've depended on behaviour X in the past and can't change it, so we need some tweaking for problem Y. And everything is totally fixable - at least as long as people are willing to! Linus -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] the scheduled eepro100 removal
Adrian Bunk wrote: This patch contains the scheduled removal of the eepro100 driver. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] This keeps coming around, but I haven't seen an answer to the questions raised by Eric Piel or Kiszka. I do know that e100 didn't work on some IBM rackmount servers and eepro100 did, but since I'm no longer responsible for those machines I can't retest. Perhaps someone will be able to provide data points. IBM current offerings as of about three years ago, I had a few dozen of them at one time. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: max_loop limit
Tomas M wrote: 255 loop devices are insufficient? What kind of scenario do you have in mind? Thank you very much for replying. In 1981, Bill Gates said that 64KB of memory is enough for everybody. And you know how much RAM do you have right now. :) Actually, I believe the number was 640k, the quote included the phrase should be, the available memory on the IBM PC. And this was after IBM decided to put the video adapter in memory at 640k, Intel decided to provide only 1MB of address space on the 8086, and was in the context of mainframes of the day, some of which could only address 1MB. And having run clients with three users on an XT with just that 640kB and UNIX, I don't think he was wrong about the memory for that time, just the O/S. BTW: anyone got a copy of PC/IX (SysIII for XT) around? I'd love to run that in a VM just for the comparison. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: max_loop limit
roland wrote: partitions on loop or device-mapper devices ? you can use kpartx tool for this. bryn m. reeves told me, that it's probably poissible to create udev rules that will automatically create partition maps when a new loop device is setup, which is better than adding partitioning logig into dm-loop for example. It is certainly possible to create a partitionable RAID device from a loop device. Should be possible to use nbd as well, but I can't seem to get nbd to work on 2.6.21-rc (my working system runs 2.6.17). example: kpartx -a /dev/mapper/loop0 # ls /dev/mapper/loop0* /dev/mapper/loop0 /dev/mapper/loop0p1 /dev/mapper/loop0p2 /dev/mapper/loop0p3 i have seen a patch for loop.c doing this, though. search the archives for this regards roland On Thu, Mar 22, 2007 at 02:33:14PM +, Al Viro wrote: Correction: current ABI is crap. To set the thing up you need to open it and issue an ioctl. Which is a bloody bad idea, for obvious reasons... Agreed. What would be a right way? Global device ala ptmx/tun/tap? New syscall? Something else? OG. - ] -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] possible USB regression with 2.6.21-rc4: iPod doesn't work
Tino Keitel wrote: On Mon, Mar 26, 2007 at 17:15:53 -0400, Alan Stern wrote: [...] The lack of messages from the iPod seems to indicate that the hub isn't working right. You could try plugging the iPod into a different hub or directly into the computer. Or maybe into a different port of that hub. Uh, I think I found the reason for the strange behaviour at shutdown/suspend. When I unload the usblp module, then the iPod is recognized. And that's not the case with 2.6.20? -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kmalloc() with size zero
Stephane Eranian wrote: Hi, On Sun, Mar 25, 2007 at 06:30:34PM +0200, Folkert van Heusden wrote: I'd say feature, glibc's malloc also returns an address on malloc(0). This is implementation defined-the standard allows for return of either null or an address. Entirely for entertainment: AIX (5.3) returns NULL, IRIX returns a valid address. That's interesting, so many different behaviors! Personally, I still prefer when malloc(0) returns zero because it makes it easier to catch errors. Exactly, the address returned is not really useful, the improved error checking is useful. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[2.6.21-rc5-git1][KVM] exception on start VM
Starting a VM for Win98SE: posidon:root /usr/local/kvm-15/bin/qemu -m 128 -hda Win98SE-2.kvm exception 13 (0) rax f000ff53 rbx rcx 005a rdx 000e rsi 001100c4 rdi 0002a002 rsp 00086650 rbp 667a r8 r9 r10 r11 r12 r13 r14 r15 rip d350 rflags 00237206 cs fff8 (000fff80/ p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) ds (/ p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) es (/ p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) ss 103f (000103f0/ p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) fs (/ p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) gs (/ p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) tr (0885/2088 p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0) ldt (/ p 1 dpl 0 db 0 s 0 type 2 l 0 g 0 avl 0) gdt 87244/2f idt 0/3ff cr0 6010 cr2 0 cr3 0 cr4 0 cr8 0 efer 0 Aborted posidon:root Hope that's useful, I was looking at nbd issues and just tried for curiosity. -- [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] the scheduled eepro100 removal
Brandeburg, Jesse wrote: Roberto Nibali wrote: Sounds sane to me. My overall opinion on eepro100 removal is that we're not there yet. Rare problem cases remain where e100 fails but eepro100 works, and it's older drivers so its low priority for everybody. Needs to happen, though... It seems that several Tyan Opteron base system that were using IPMI add on card. the IPMI card share intel 100Mhz nic onboard. you need to use eepro100 instead of e100 otherwise the e100 will shutdown OOB (out of Band) connection for IPMI when shut down the OS. I find it hard to believe that something as common as IPMI in conjunction with the IPMI technology wasn't tested in Intel's lab. From my experience with Intel Server boards, onboard IPMI (all offered versions) and e100/e1000 NICs, I've never ever experienced any problems operating the BMC over the NIC. Also, I don't quite understand you point about the IPMI card sharing the 100Mbit/s NIC onboard? What exactly is shared? It's a legit problem, but only with this *one* system. Of course the eepro100 driver is not taking a lot of maintenance either, removing it is not critical as long as there is a legitimate need to support old hardware. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20.3 AMD64 oops in CFQ code
Tejun Heo wrote: [resending. my mail service was down for more than a week and this message didn't get delivered.] [EMAIL PROTECTED] wrote: Anyway, what's annoying is that I can't figure out how to bring the drive back on line without resetting the box. It's in a hot-swap enclosure, but power cycling the drive doesn't seem to help. I thought libata hotplug was working? (SiI3132 card, using the sil24 driver.) Yeah, it's working but failing resets are considered highly dangerous (in that the controller status is unknown and may cause something dangerous like screaming interrupts) and port is muted after that. The plan is to handle this with polling hotplug such that libata tries to revive the port if PHY status change is detected by polling. Patches are available but they need other things to resolved to get integrated. I think it'll happen before the summer. Anyways, you can tell libata to retry the port by manually telling it to rescan the port (echo - - - /sys/class/scsi_host/hostX/scan). I won't say that's voodoo, but if I ever did it I'd wipe down my keyboard with holy water afterward. ;-) Well, I did save the message in my tricks file, but it sounds like a last ditch effort after something get very wrong. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RFC] fast file mapping for loop
Jens Axboe wrote: Hi, loop.c currently uses the page cache interface to do IO to file backed devices. This works reasonably well for simple things, like mapping an iso9660 file for direct mount and other read-only workloads. Writing is somewhat problematic, as anyone who has really used this feature can attest to - it tends to confuse the vm (hello kswapd) since it break dirty accounting and behaves very erratically on writeout. Did I mention that it's pretty slow as well, for both reads and writes? Since you are looking for comments, I'll mention a loop-related behavior I've been seeing and see if it gets comments or is useful, since it can be used to tickle bad behavior on demand. I have an 6GB sparse file, which I mount with cryptoloop and populate as an ext3 filesystem (more later on why). I then copy ~5.8GB of data to the filesystem, which is unmounted to be burnt to a DVD. Before it's burned the dvdisaster application is used to add some ECC information to the end, and make an image which fits on a DVD-DL. Media will be burned and distributed to multiple locations. The problem: When copying with rsync, the copy runs at ~25MB/s for a while, then falls into a pattern of bursts of 25MB/s followed by 10-15 sec of iowait with no disk activity. So I tried doing the copy by cpio find . -depth | cpio -pdm /mnt/loop which shows exactly the same behavior. Then, for no good reason I tried find . -depth | cpio -pBdm /mnt/loop and the copy ran at 25MB/s for the whole data set. I was able to see similar results with a pure loop mount, I only mention the crypto for accuracy. Because many of these have been shipped over the last two years and new loop code would only be useful in this case if it were compatible so old data sets could be read. It also behaves differently than a real drive. For writes, completions are done once they hit page cache. Since loop queues bio's async and hands them off to a thread, you can have a huge backlog of stuff to do. It's hard to attempt to guarentee data safety for file systems on top of loop without making it even slower than it currently is. Back when loop was only used for iso9660 mounting and other simple things, this didn't matter. Now it's often used in xen (and others) setups where we do care about performance AND writing. So the below is a attempt at speeding up loop and making it behave like a real device. It's a somewhat quick hack and is still missing one piece to be complete, but I'll throw it out there for people to play with and comment on. So how does it work? Instead of punting IO to a thread and passing it through the page cache, we instead attempt to send the IO directly to the filesystem block that it maps to. loop maintains a prio tree of known extents in the file (populated lazily on demand, as needed). Advantages of this approach: - It's fast, loop will basically work at device speed. - It's fast, loop it doesn't put a huge amount of system load on the system when busy. When I did comparison tests on my notebook with an external drive, running a simple tiobench on the current in-kernel loop with a sparse file backing rendered the notebook basically unusable while the test was ongoing. The remapper version had no more impact than it did when used directly on the external drive. - It behaves like a real block device. - It's easy to support IO barriers, which is needed to ensure safety especially in virtualized setups. Disadvantages: - The file block mappings must not change while loop is using the file. This means that we have to ensure exclusive access to the file and this is the bit that is currently missing in the implementation. It would be nice if we could just do this via open(), ideas welcome... - It'll tie down a bit of memory for the prio tree. This is GREATLY offset by the reduced page cache foot print though. - It cannot be used with the loop encryption stuff. dm-crypt should be used instead, on top of loop (which, I think, is even the recommended way to do this today, so not a big deal). -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] per-task I/O throttling
Andrea Righi wrote: Allow to limit the bandwidth of I/O-intensive processes, like backup tools running in background, large files copy, checksums on huge files, etc. This kind of processes can noticeably impact the system responsiveness for some time and playing with tasks' priority is not always an acceptable solution. This patch allows to specify a maximum I/O rate in sectors per second for each single process via /proc/PID/io_throttle (default is zero, that specify no limit). It would seem to me that this would be vastly more useful in the real world if there were a settable default, so that administrators could avoid having to find and tune individual user processes. And it would seem far less common that the admin would want to set the limit *up* for a given process, and it's likely to be one known to the admin, at least by name. Of course if you want to do the effort to make it fully tunable, it could have a default by UID or GID. Usful on machines shared by students or managers. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/