Re: RSDL v0.31
So in an attempt to summarise the situation, what are the advantages of RSDL over mainline. Fairness why do you think fairness is good, especially always good? Starvation free even starvation is sometimes a good thing - there's a place for processes that only use the CPU if it is otherwise idle. that is, they are deliberately starved all the rest of the time. Much lower and bound latencies in an average sense? also, under what circumstances does this actually matter? (please don't offer something like RT audio on an overloaded machine- that's operator error, not something to design for.) Deterministic not a bad thing, but how does this make itself apparent and of value to the user? I think everyone is extremely comfortable with non-determinism (stemming from networks, caches, interleaved workloads, etc) Better interactivity for the majority of cases. how is this measured? is this statement really just a reiteration of the latency claim? Now concentrating on the very last aspect since that seems to be the sticking point. nah, I think the fairness and latency claims are the real issues. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL v0.31
So in an attempt to summarise the situation, what are the advantages of RSDL over mainline. Fairness why do you think fairness is good, especially always good? Starvation free even starvation is sometimes a good thing - there's a place for processes that only use the CPU if it is otherwise idle. that is, they are deliberately starved all the rest of the time. Much lower and bound latencies in an average sense? also, under what circumstances does this actually matter? (please don't offer something like RT audio on an overloaded machine- that's operator error, not something to design for.) Deterministic not a bad thing, but how does this make itself apparent and of value to the user? I think everyone is extremely comfortable with non-determinism (stemming from networks, caches, interleaved workloads, etc) Better interactivity for the majority of cases. how is this measured? is this statement really just a reiteration of the latency claim? Now concentrating on the very last aspect since that seems to be the sticking point. nah, I think the fairness and latency claims are the real issues. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
Something is seriously wrong with that OOM killer. do you know you don't have to operate in OOM-slaughter mode? "vm.overcommit_memory = 2" in your /etc/sysctl.conf puts you into a mode where the kernel tracks your "committed" memory needs, and will eventually cause some allocations to fail. this is often much nicer than the default random OOM slaughter. (you probably also need to adjust vm.overcommit_ratio with some knowlege of your MemTotal and SwapTotal.) regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
Something is seriously wrong with that OOM killer. do you know you don't have to operate in OOM-slaughter mode? vm.overcommit_memory = 2 in your /etc/sysctl.conf puts you into a mode where the kernel tracks your committed memory needs, and will eventually cause some allocations to fail. this is often much nicer than the default random OOM slaughter. (you probably also need to adjust vm.overcommit_ratio with some knowlege of your MemTotal and SwapTotal.) regards, mark hahn. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Where is the performance bottleneck?
> 8 SCSI U320 (15000 rpm) disks where 4 disks (sdc, sdd, sde, sdf) figure each is worth, say, 60 MB/s, so you'll peak (theoretically) at 240 MB/s per channel. > The U320 SCSI controller has a 64 bit PCI-X bus for itself, there is no other > device on that bus. Unfortunatly I was unable to determine at what speed > it is running, here the output from lspci -vv: ... > Status: Bus=2 Dev=4 Func=0 64bit+ 133MHz+ SCD- USC-, > DC=simple, the "133MHz+" is a good sign. OTOH the latency (72) seems rather low - my understanding is that that would noticably limit the size of burst transfers. > Anyway, I thought with this system I would get theoretically 640 MB/s using > both channels. "theoretically" in the same sense as "according to quantum theory, Bush and BinLadin may swap bodies tomorrow morning at 4:59." > write speeds for this system. But testing shows that the absolute maximum I > can reach with software raid is only approx. 270 MB/s for writting. Which is > very disappointing. it's a bit low, but "very" is unrealistic... > deadline and distribution is fedora core 4 x86_64 with all updates. Chunksize > is always the default from mdadm (64k). Filesystem was always created with the > command mke2fs -j -b4096 -O dir_index /dev/mdx. bear in mind that a 64k chunksize means that an 8 disk raid5 will really only work well for writes that are multiples of of 7*64=448K... > I also have tried with 2.6.13-rc7, but here the speed was much lower, the > maximum there was approx. 140 MB/s for writting. hmm, there should not have been any such dramatic slowdown. > Version 1.03--Sequential Output-- --Sequential Input- > --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec > %CP > Raid0 (8 disk)15744M 54406 96 247419 90 100752 25 60266 98 226651 29 830.2 > 1 > Raid0s(4 disk)15744M 54915 97 253642 89 73976 18 59445 97 198372 24 659.8 > 1 > Raid0s(4 disk)15744M 54866 97 268361 95 72852 17 59165 97 187183 22 666.3 > 1 you're obviously saturating something already with 2 disks. did you play with "blockdev --setra" setings? > Raid5 (8 disk)15744M 55881 98 153735 51 61680 24 56229 95 207348 44 741.2 > 1 > Raid5s(4 disk)15744M 55238 98 81023 28 36859 14 56358 95 193030 38 605.7 > 1 > Raid5s(4 disk)15744M 54920 97 83680 29 36551 14 56917 95 185345 35 599.8 > 1 the block-read shows that even with 3 disks, you're hitting ~190 MB/s, which is pretty close to your actual disk speed. the low value for block-out is probably just due to non-stripe writes needing R/M/W cycles. > /dev/sdc 15744M 53861 95 102270 35 25718 6 37273 60 76275 8 377.0 > 0 the block-out is clearly distorted by buffer-cache (too high), but the input rate is good and consistent. obvoiusly, it'll fall off somewhat towards inner tracks, but will probably still be above 50. > Why do I only get 247 MB/s for writting and 227 MB/s for reading (from the > bonnie++ results) for a Raid0 over 8 disks? I was expecting to get nearly > three times those numbers if you take the numbers from the individual disks. expecting 3x is unreasonable; 2x (480 or so) would be good. I suspect that some (sw kernel) components are badly tuned for fast IO. obviously, most machines are in the 50-100 MB/s range, so this is not surprising. readahead is certainly one, but there are also magic numbers in MD as well, not to mention PCI latency, scsi driver tuning, probably even /proc/sys/vm settings. I've got some 4x2.6G opteron servers (same board, 32G PC3200), but alas, end-users have found out about them. not to mention that they only have 3x160G SATA disks... regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Where is the performance bottleneck?
8 SCSI U320 (15000 rpm) disks where 4 disks (sdc, sdd, sde, sdf) figure each is worth, say, 60 MB/s, so you'll peak (theoretically) at 240 MB/s per channel. The U320 SCSI controller has a 64 bit PCI-X bus for itself, there is no other device on that bus. Unfortunatly I was unable to determine at what speed it is running, here the output from lspci -vv: ... Status: Bus=2 Dev=4 Func=0 64bit+ 133MHz+ SCD- USC-, DC=simple, the 133MHz+ is a good sign. OTOH the latency (72) seems rather low - my understanding is that that would noticably limit the size of burst transfers. Anyway, I thought with this system I would get theoretically 640 MB/s using both channels. theoretically in the same sense as according to quantum theory, Bush and BinLadin may swap bodies tomorrow morning at 4:59. write speeds for this system. But testing shows that the absolute maximum I can reach with software raid is only approx. 270 MB/s for writting. Which is very disappointing. it's a bit low, but very is unrealistic... deadline and distribution is fedora core 4 x86_64 with all updates. Chunksize is always the default from mdadm (64k). Filesystem was always created with the command mke2fs -j -b4096 -O dir_index /dev/mdx. bear in mind that a 64k chunksize means that an 8 disk raid5 will really only work well for writes that are multiples of of 7*64=448K... I also have tried with 2.6.13-rc7, but here the speed was much lower, the maximum there was approx. 140 MB/s for writting. hmm, there should not have been any such dramatic slowdown. Version 1.03--Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP Raid0 (8 disk)15744M 54406 96 247419 90 100752 25 60266 98 226651 29 830.2 1 Raid0s(4 disk)15744M 54915 97 253642 89 73976 18 59445 97 198372 24 659.8 1 Raid0s(4 disk)15744M 54866 97 268361 95 72852 17 59165 97 187183 22 666.3 1 you're obviously saturating something already with 2 disks. did you play with blockdev --setra setings? Raid5 (8 disk)15744M 55881 98 153735 51 61680 24 56229 95 207348 44 741.2 1 Raid5s(4 disk)15744M 55238 98 81023 28 36859 14 56358 95 193030 38 605.7 1 Raid5s(4 disk)15744M 54920 97 83680 29 36551 14 56917 95 185345 35 599.8 1 the block-read shows that even with 3 disks, you're hitting ~190 MB/s, which is pretty close to your actual disk speed. the low value for block-out is probably just due to non-stripe writes needing R/M/W cycles. /dev/sdc 15744M 53861 95 102270 35 25718 6 37273 60 76275 8 377.0 0 the block-out is clearly distorted by buffer-cache (too high), but the input rate is good and consistent. obvoiusly, it'll fall off somewhat towards inner tracks, but will probably still be above 50. Why do I only get 247 MB/s for writting and 227 MB/s for reading (from the bonnie++ results) for a Raid0 over 8 disks? I was expecting to get nearly three times those numbers if you take the numbers from the individual disks. expecting 3x is unreasonable; 2x (480 or so) would be good. I suspect that some (sw kernel) components are badly tuned for fast IO. obviously, most machines are in the 50-100 MB/s range, so this is not surprising. readahead is certainly one, but there are also magic numbers in MD as well, not to mention PCI latency, scsi driver tuning, probably even /proc/sys/vm settings. I've got some 4x2.6G opteron servers (same board, 32G PC3200), but alas, end-users have found out about them. not to mention that they only have 3x160G SATA disks... regards, mark hahn. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] Re: 2.6.13-rc3-mm1 (ckrm)
> > if CKRM is just extensions, I think it should be an external patch. > > if it provides a path towards unifying the many disparate RM mechanisms > > already in the kernel, great! > > OK, so if it provides a path towards unifying these, what should happen > to the old interfaces when they conflict with those offered by CKRM? I don't think the name matters, as long as the RM code is simplified/unified. that is, the only difference at first would be a change in name - same behavior. > For instance, I'm considering how a per-class (re)nice setting would > work. What should happen when the user (re)nices a process to a > different value than the nice of the process' class? Should CKRM: it has to behave as it does now, unless the admin has imposed some class structure other than the normal POSIX one (ie, nice pertains only to a process and is inherited by future children.) > a) disable the old interface by > i) removing it > ii) return an error when CKRM is active > iii) return an error when CKRM has specified a nice value for the > process via membership in a class > iv) return an error when the (re)nice value is inconsistent with the > nice value assigned to the class some interfaces must remain (renice), and if their behavior is implemented via CKRM, it must, by default, act as before. other interfaces (say overcommit_ratio) probably don't need to remain. > b) trust the user, ignore the class nice value, and allow the new nice > value users can only nice up, and that policy needs to remain, obviously. you appear to be asking what happens when the scope of the old mechanism conflicts with the scope determined by admin-set CKRM classes. I'd say that nicing a single process should change the nice of the whole class that the process is in, if any. otherwise, it acts to rip that process out of the class, which is probably even less 'least surprise'. > This sort of question would probably come up for any other CKRM > "embraced-and-extended" tunables. Should they use the answer to this > one, or would it go on a case-by-case basis? I don't see that CKRM should play by rules different from other kernel improvements - preserve standard/former behavior when that behavior is documented (certainly nice is). in the absense of admin-set classes, nice would behave the same. all CKRM is doing here is providing a broader framework to hang the tunables on. it should be able to express all existing tunables in scope. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] Re: 2.6.13-rc3-mm1 (ckrm)
if CKRM is just extensions, I think it should be an external patch. if it provides a path towards unifying the many disparate RM mechanisms already in the kernel, great! OK, so if it provides a path towards unifying these, what should happen to the old interfaces when they conflict with those offered by CKRM? I don't think the name matters, as long as the RM code is simplified/unified. that is, the only difference at first would be a change in name - same behavior. For instance, I'm considering how a per-class (re)nice setting would work. What should happen when the user (re)nices a process to a different value than the nice of the process' class? Should CKRM: it has to behave as it does now, unless the admin has imposed some class structure other than the normal POSIX one (ie, nice pertains only to a process and is inherited by future children.) a) disable the old interface by i) removing it ii) return an error when CKRM is active iii) return an error when CKRM has specified a nice value for the process via membership in a class iv) return an error when the (re)nice value is inconsistent with the nice value assigned to the class some interfaces must remain (renice), and if their behavior is implemented via CKRM, it must, by default, act as before. other interfaces (say overcommit_ratio) probably don't need to remain. b) trust the user, ignore the class nice value, and allow the new nice value users can only nice up, and that policy needs to remain, obviously. you appear to be asking what happens when the scope of the old mechanism conflicts with the scope determined by admin-set CKRM classes. I'd say that nicing a single process should change the nice of the whole class that the process is in, if any. otherwise, it acts to rip that process out of the class, which is probably even less 'least surprise'. This sort of question would probably come up for any other CKRM embraced-and-extended tunables. Should they use the answer to this one, or would it go on a case-by-case basis? I don't see that CKRM should play by rules different from other kernel improvements - preserve standard/former behavior when that behavior is documented (certainly nice is). in the absense of admin-set classes, nice would behave the same. all CKRM is doing here is providing a broader framework to hang the tunables on. it should be able to express all existing tunables in scope. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] Re: 2.6.13-rc3-mm1 (ckrm)
> > actually, let me also say that CKRM is on a continuum that includes > > current (global) /proc tuning for various subsystems, ulimits, and > > at the other end, Xen/VMM's. it's conceivable that CKRM could wind up > > being useful and fast enough to subsume the current global and per-proc > > tunables. after all, there are MANY places where the kernel tries to > > maintain some sort of context to allow it to tune/throttle/readahead > > based on some process-linked context. "embracing and extending" > > those could make CKRM attractive to people outside the mainframe market. > > Seems like an excellent suggestion to me! Yeah, it may be possible to > maintain the context the kernel keeps on a per-class basis instead of > globally or per-process. right, but are the CKRM people ready to take this on? for instance, I just grepped 'throttle' in kernel/mm and found a per-task RM in page-writeback.c. it even has a vaguely class-oriented logic, since it exempts RT tasks. if CKRM can become a way to make this stuff cleaner and more effective (again, for normal tasks), then great. but bolting on a big new different, intrusive mechanism that slows down all normal jobs by 3% just so someone can run 10K mostly-idle guests on a giant Power box, well, that's gross. > The real question is what constitutes a useful > "extension" :). if CKRM is just extensions, I think it should be an external patch. if it provides a path towards unifying the many disparate RM mechanisms already in the kernel, great! > I was thinking that per-class nice values might be a good place to > start as well. One advantage of per-class as opposed to per-process nice > is the class is less transient than the process since its lifetime is > determined solely by the system administrator. but the Linux RM needs to subsume traditional Unix process groups, and inherited nice/schd class, and even CAP_ stuff. I think CKRM could start to do this, since classes are very general. but merely adding a new, incompatible feature is just Not A Good Idea. regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc3-mm1 (ckrm)
> > > the fast path slower and less maintainable. if you are really concerned > > > about isolating many competing servers on a single piece of hardware, then > > > run separate virtualized environments, each with its own user-space. > > > > And the virtualisation layer has to do the same job with less > > information. That to me implies that the virtualisation case is likely > > to be materially less efficient, its just the inefficiency you are > > worried about is hidden in a different pieces of code. I imagine you, like me, are currently sitting in the Xen talk, and I don't believe they are or will do anything so dumb as to throw away or lose information. yes, in principle, the logic will need to be somewhere, and I'm suggesting that the virtualization logic should be in VMM-only code so it has literally zero effect on host-native processes. *or* the host-native fast-path. > > Secondly a lot of this doesnt matter if CKRM=n compiles to no code > > anyway > > I'm actually trying to keep the impact of CKRM=y to near-zero, ergo > only an impact if you create classes. And even then, the goal is to > keep that impact pretty small as well. but to really do CKRM, you are going to want quite extensive interaction with the scheduler, VM page replacement policies, etc. all incredibly performance-sensitive areas. actually, let me also say that CKRM is on a continuum that includes current (global) /proc tuning for various subsystems, ulimits, and at the other end, Xen/VMM's. it's conceivable that CKRM could wind up being useful and fast enough to subsume the current global and per-proc tunables. after all, there are MANY places where the kernel tries to maintain some sort of context to allow it to tune/throttle/readahead based on some process-linked context. "embracing and extending" those could make CKRM attractive to people outside the mainframe market. > Plus you won't have to manage each operating system instance which > can grow into a pain under virtualization. But I still maintain that > both have their place. CKRM may have its place in an externally-maintained patch ;) regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc3-mm1 (ckrm)
the fast path slower and less maintainable. if you are really concerned about isolating many competing servers on a single piece of hardware, then run separate virtualized environments, each with its own user-space. And the virtualisation layer has to do the same job with less information. That to me implies that the virtualisation case is likely to be materially less efficient, its just the inefficiency you are worried about is hidden in a different pieces of code. I imagine you, like me, are currently sitting in the Xen talk, and I don't believe they are or will do anything so dumb as to throw away or lose information. yes, in principle, the logic will need to be somewhere, and I'm suggesting that the virtualization logic should be in VMM-only code so it has literally zero effect on host-native processes. *or* the host-native fast-path. Secondly a lot of this doesnt matter if CKRM=n compiles to no code anyway I'm actually trying to keep the impact of CKRM=y to near-zero, ergo only an impact if you create classes. And even then, the goal is to keep that impact pretty small as well. but to really do CKRM, you are going to want quite extensive interaction with the scheduler, VM page replacement policies, etc. all incredibly performance-sensitive areas. actually, let me also say that CKRM is on a continuum that includes current (global) /proc tuning for various subsystems, ulimits, and at the other end, Xen/VMM's. it's conceivable that CKRM could wind up being useful and fast enough to subsume the current global and per-proc tunables. after all, there are MANY places where the kernel tries to maintain some sort of context to allow it to tune/throttle/readahead based on some process-linked context. embracing and extending those could make CKRM attractive to people outside the mainframe market. Plus you won't have to manage each operating system instance which can grow into a pain under virtualization. But I still maintain that both have their place. CKRM may have its place in an externally-maintained patch ;) regards, mark hahn. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] Re: 2.6.13-rc3-mm1 (ckrm)
actually, let me also say that CKRM is on a continuum that includes current (global) /proc tuning for various subsystems, ulimits, and at the other end, Xen/VMM's. it's conceivable that CKRM could wind up being useful and fast enough to subsume the current global and per-proc tunables. after all, there are MANY places where the kernel tries to maintain some sort of context to allow it to tune/throttle/readahead based on some process-linked context. embracing and extending those could make CKRM attractive to people outside the mainframe market. Seems like an excellent suggestion to me! Yeah, it may be possible to maintain the context the kernel keeps on a per-class basis instead of globally or per-process. right, but are the CKRM people ready to take this on? for instance, I just grepped 'throttle' in kernel/mm and found a per-task RM in page-writeback.c. it even has a vaguely class-oriented logic, since it exempts RT tasks. if CKRM can become a way to make this stuff cleaner and more effective (again, for normal tasks), then great. but bolting on a big new different, intrusive mechanism that slows down all normal jobs by 3% just so someone can run 10K mostly-idle guests on a giant Power box, well, that's gross. The real question is what constitutes a useful extension :). if CKRM is just extensions, I think it should be an external patch. if it provides a path towards unifying the many disparate RM mechanisms already in the kernel, great! I was thinking that per-class nice values might be a good place to start as well. One advantage of per-class as opposed to per-process nice is the class is less transient than the process since its lifetime is determined solely by the system administrator. but the Linux RM needs to subsume traditional Unix process groups, and inherited nice/schd class, and even CAP_ stuff. I think CKRM could start to do this, since classes are very general. but merely adding a new, incompatible feature is just Not A Good Idea. regards, mark hahn. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc3-mm1 (ckrm)
> of the various environments. I don't think you are one of those end > users, though. I don't think I'm required to make everyone happy all > the time. ;) the issue is whether CKRM (in it's real form, not this thin edge) will noticably hurt Linux's fast-path. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc3-mm1 (ckrm)
> > > yes, that's the crux. CKRM is all about resolving conflicting resource > > > demands in a multi-user, multi-server, multi-purpose machine. this is a > > > huge undertaking, and I'd argue that it's completely inappropriate for > > > *most* servers. that is, computers are generally so damn cheap that > > > the clear trend is towards dedicating a machine to a specific purpose, > > > rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single machine. > > This is a big NAK - if computers are so damn cheap, why is virtualization > and consolidation such a big deal? Well, the answer is actually that yes, you did miss my point. I'm actually arguing that it's bad design to attempt to arbitrate within a single shared user-space. you make the fast path slower and less maintainable. if you are really concerned about isolating many competing servers on a single piece of hardware, then run separate virtualized environments, each with its own user-space. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc3-mm1 (ckrm)
yes, that's the crux. CKRM is all about resolving conflicting resource demands in a multi-user, multi-server, multi-purpose machine. this is a huge undertaking, and I'd argue that it's completely inappropriate for *most* servers. that is, computers are generally so damn cheap that the clear trend is towards dedicating a machine to a specific purpose, rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single machine. This is a big NAK - if computers are so damn cheap, why is virtualization and consolidation such a big deal? Well, the answer is actually that yes, you did miss my point. I'm actually arguing that it's bad design to attempt to arbitrate within a single shared user-space. you make the fast path slower and less maintainable. if you are really concerned about isolating many competing servers on a single piece of hardware, then run separate virtualized environments, each with its own user-space. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc3-mm1 (ckrm)
of the various environments. I don't think you are one of those end users, though. I don't think I'm required to make everyone happy all the time. ;) the issue is whether CKRM (in it's real form, not this thin edge) will noticably hurt Linux's fast-path. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc3-mm1 (ckrm)
> I suspect that the main problem is that this patch is not a mainstream > kernel feature that will gain multiple uses, but rather provides > support for a specific vendor middleware product used by that > vendor and a few closely allied vendors. If it were smaller or > less intrusive, such as a driver, this would not be a big problem. > That's not the case. yes, that's the crux. CKRM is all about resolving conflicting resource demands in a multi-user, multi-server, multi-purpose machine. this is a huge undertaking, and I'd argue that it's completely inappropriate for *most* servers. that is, computers are generally so damn cheap that the clear trend is towards dedicating a machine to a specific purpose, rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single machine. this is *directly* in conflict with certain prominent products, such as the Altix and various less-prominent Linux-based mainframes. they're all about partitioning/virtualization - the big-iron aesthetic of splitting up a single machine. note that it's not just about "big", since cluster-based approaches can clearly scale far past big-iron, and are in effect statically partitioned. yes, buying a hideously expensive single box, and then chopping it into little pieces is more than a little bizarre, and is mainly based on a couple assumptions: - that clusters are hard. really, they aren't. they are not necessarily higher-maintenance, can be far more robust, usually do cost less. just about the only bad thing about clusters is that they tend to be somewhat larger in size. - that partitioning actually makes sense. the appeal is that if you have a partition to yourself, you can only hurt yourself. but it also follows that burstiness in resource demand cannot be overlapped without either constantly tuning the partitions or infringing on the guarantee. CKRM is one of those things that could be done to Linux, and will benefit a few, but which will almost certainly hurt *most* of the community. let me say that the CKRM design is actually quite good. the issue is whether the extensive hooks it requires can be done (at all) in a way which does not disporportionately hurt maintainability or efficiency. CKRM requires hooks into every resource-allocation decision fastpath: - if CKRM is not CONFIG, the only overhead is software maintenance. - if CKRM is CONFIG but not loaded, the overhead is a pointer check. - if CKRM is CONFIG and loaded, the overhead is a pointer check and a nontrivial callback. but really, this is only for CKRM-enforced limits. CKRM really wants to change behavior in a more "weighted" way, not just causing an allocation/fork/packet to fail. a really meaningful CKRM needs to be tightly integrated into each resource manager - effecting each scheduler (process, memory, IO, net). I don't really see how full-on CKRM can be compiled out, unless these schedulers are made fully pluggable. finally, I observe that pluggable, class-based resource _limits_ could probably be done without callbacks and potentially with low overhead. but mere limits doesn't meet CKRM's goal of flexible, wide-spread resource partitioning within a large, shared machine. regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc3-mm1 (ckrm)
I suspect that the main problem is that this patch is not a mainstream kernel feature that will gain multiple uses, but rather provides support for a specific vendor middleware product used by that vendor and a few closely allied vendors. If it were smaller or less intrusive, such as a driver, this would not be a big problem. That's not the case. yes, that's the crux. CKRM is all about resolving conflicting resource demands in a multi-user, multi-server, multi-purpose machine. this is a huge undertaking, and I'd argue that it's completely inappropriate for *most* servers. that is, computers are generally so damn cheap that the clear trend is towards dedicating a machine to a specific purpose, rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single machine. this is *directly* in conflict with certain prominent products, such as the Altix and various less-prominent Linux-based mainframes. they're all about partitioning/virtualization - the big-iron aesthetic of splitting up a single machine. note that it's not just about big, since cluster-based approaches can clearly scale far past big-iron, and are in effect statically partitioned. yes, buying a hideously expensive single box, and then chopping it into little pieces is more than a little bizarre, and is mainly based on a couple assumptions: - that clusters are hard. really, they aren't. they are not necessarily higher-maintenance, can be far more robust, usually do cost less. just about the only bad thing about clusters is that they tend to be somewhat larger in size. - that partitioning actually makes sense. the appeal is that if you have a partition to yourself, you can only hurt yourself. but it also follows that burstiness in resource demand cannot be overlapped without either constantly tuning the partitions or infringing on the guarantee. CKRM is one of those things that could be done to Linux, and will benefit a few, but which will almost certainly hurt *most* of the community. let me say that the CKRM design is actually quite good. the issue is whether the extensive hooks it requires can be done (at all) in a way which does not disporportionately hurt maintainability or efficiency. CKRM requires hooks into every resource-allocation decision fastpath: - if CKRM is not CONFIG, the only overhead is software maintenance. - if CKRM is CONFIG but not loaded, the overhead is a pointer check. - if CKRM is CONFIG and loaded, the overhead is a pointer check and a nontrivial callback. but really, this is only for CKRM-enforced limits. CKRM really wants to change behavior in a more weighted way, not just causing an allocation/fork/packet to fail. a really meaningful CKRM needs to be tightly integrated into each resource manager - effecting each scheduler (process, memory, IO, net). I don't really see how full-on CKRM can be compiled out, unless these schedulers are made fully pluggable. finally, I observe that pluggable, class-based resource _limits_ could probably be done without callbacks and potentially with low overhead. but mere limits doesn't meet CKRM's goal of flexible, wide-spread resource partitioning within a large, shared machine. regards, mark hahn. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11, IDE: Strange scheduling behaviour: high-pri RT process not scheduled?
> I've written a small test program which enables periodic RTC interrupts > at 8192 Hz and then goes into a loop reading /dev/rtc and collecting > timing statistics (using the rdtscl macro). straightforward test, used for many years in the linux community (I claim to have been the first to publish it on lkml ;) > The test system runs a 2.6.11 kernel (no SMP) on a Pentium3 500 MHz > embedded hardware. which probably has memory bandwidth of at most a couple hundred MB/s, which is really horrible by modern standards. > However, things break seriously when exercising the CF card in parallel > (e.g. with a dd if=/dev/hda of=/dev/null): > > * The rtc *interrupt handler* is delayed for up to 250 *micro*seconds. > This is very bad for my purpose, but easy to explain: It is roughly the > time needed to transfer 512 Bytes from a CF card which can transfer 2 > Mbyte/sec, and obviously, the CPU blocks all interrupts while making pio > > transfers. (Why? Is this really necessary?) even with -u1, isn't there still a softirq queue that will delay the wakeup of your user-space tester? > * The *test program* is regularly blocked for 63 *milli*seconds, > sometimes for up to 300 *milli*seconds, which is absolutely > unacceptable. guessing that's VM housekeeping. > Now the big question: > *** Why doesn't my rt test program get *any* CPU for 63 jiffies? *** > (the system ticks at 1000 HZ) because it's user-space. the 'rt' is a bit of a misnomer - it's merely a higher priority, less preemptable job. > * The dd program obviously gets some CPU regularly (because it copies 2 > MB/s, and because no other program could cause the 1 % user CPU load). > The dd runs at normal shell scheduling priority, so it should be > preempted immediately by my test program! out of curiosity, does running it with "nice -n 19" change anything? > 2.) Using a realtime-preempt-2.6.12-rc1-V0.7.41-11 kernel with > PREEMPT_RT: > If my test program runs at rtpri 99, the problem is gone: It is > scheduled within 30 microseconds after the rtc interrupt. > If my test program runs at rtpri 1, it still suffers from delays > in the 30 to 300 millisecond range. so your problem is solved, no? also, did you try a (plain) preemptable kernel? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11, IDE: Strange scheduling behaviour: high-pri RT process not scheduled?
I've written a small test program which enables periodic RTC interrupts at 8192 Hz and then goes into a loop reading /dev/rtc and collecting timing statistics (using the rdtscl macro). straightforward test, used for many years in the linux community (I claim to have been the first to publish it on lkml ;) The test system runs a 2.6.11 kernel (no SMP) on a Pentium3 500 MHz embedded hardware. which probably has memory bandwidth of at most a couple hundred MB/s, which is really horrible by modern standards. However, things break seriously when exercising the CF card in parallel (e.g. with a dd if=/dev/hda of=/dev/null): * The rtc *interrupt handler* is delayed for up to 250 *micro*seconds. This is very bad for my purpose, but easy to explain: It is roughly the time needed to transfer 512 Bytes from a CF card which can transfer 2 Mbyte/sec, and obviously, the CPU blocks all interrupts while making pio transfers. (Why? Is this really necessary?) even with -u1, isn't there still a softirq queue that will delay the wakeup of your user-space tester? * The *test program* is regularly blocked for 63 *milli*seconds, sometimes for up to 300 *milli*seconds, which is absolutely unacceptable. guessing that's VM housekeeping. Now the big question: *** Why doesn't my rt test program get *any* CPU for 63 jiffies? *** (the system ticks at 1000 HZ) because it's user-space. the 'rt' is a bit of a misnomer - it's merely a higher priority, less preemptable job. * The dd program obviously gets some CPU regularly (because it copies 2 MB/s, and because no other program could cause the 1 % user CPU load). The dd runs at normal shell scheduling priority, so it should be preempted immediately by my test program! out of curiosity, does running it with nice -n 19 change anything? 2.) Using a realtime-preempt-2.6.12-rc1-V0.7.41-11 kernel with PREEMPT_RT: If my test program runs at rtpri 99, the problem is gone: It is scheduled within 30 microseconds after the rtc interrupt. If my test program runs at rtpri 1, it still suffers from delays in the 30 to 300 millisecond range. so your problem is solved, no? also, did you try a (plain) preemptable kernel? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.6-pre2, pre3 VM Behavior
> > Would it be possible to maintain a dirty-rate count > > for the dirty buffers? > > > > For example, we it is possible to figure an approximate > > disk subsystem speed from most of the given information. > > Disk speed is difficult. I may enable and disable swap on any number of ... > You may be able to get some useful approximations, but you > will probably not be able to get good numbers in all cases. a useful approximation would be simply an idle flag. for instance, if the disk is idle, then cleaning a few inactive-dirty pages would make perfect sense, even in the absence of memory pressure. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.6-pre2, pre3 VM Behavior
Would it be possible to maintain a dirty-rate count for the dirty buffers? For example, we it is possible to figure an approximate disk subsystem speed from most of the given information. Disk speed is difficult. I may enable and disable swap on any number of ... You may be able to get some useful approximations, but you will probably not be able to get good numbers in all cases. a useful approximation would be simply an idle flag. for instance, if the disk is idle, then cleaning a few inactive-dirty pages would make perfect sense, even in the absence of memory pressure. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
> reads the RTC device. The patched RTC driver can then > measure the elapsed time between the interrupt and the > read from userspace. Voila: latency. interesting, but I'm not sure there's much advantage over doing it entirely in user-space with the normal /dev/rtc: http://brain.mcmaster.ca/~hahn/realfeel.c it just prints out the raw time difference from when rtc should have woken up the program. you can do your own histogram; for summary purposes, something like stdev is probably best. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
reads the RTC device. The patched RTC driver can then measure the elapsed time between the interrupt and the read from userspace. Voila: latency. interesting, but I'm not sure there's much advantage over doing it entirely in user-space with the normal /dev/rtc: http://brain.mcmaster.ca/~hahn/realfeel.c it just prints out the raw time difference from when rtc should have woken up the program. you can do your own histogram; for summary purposes, something like stdev is probably best. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: XMM: monitor Linux MM inactive/active lists graphically
> XMM is heavily modified XMEM utility that shows graphically size of > different Linux page lists: active, inactive_dirty, inactive_clean, > code, free and swap usage. It is better suited for the monitoring of > Linux 2.4 MM implementation than original (XMEM) utility. > > Find it here: http://linux.inet.hr/> interesting. I prefer to collect data separately from viewing it, and use the following simple perl script to do so; obviously, it generates a bunch of separate files, one for each metric, suitable for traditional filtering, gnuplot, etc. #!/bin/perl use IO::Handle; require 'sys/syscall.ph'; sub gettimeofday { $timeval = pack("LL", ()); syscall( _gettimeofday, $timeval, 0) != -1 or die "gettimeofday: $!"; ($sec,$usec) = unpack("LL", $timeval); return $sec + 1e-6 * $usec; } open(S,"pi.st"); PI->autoflush(1); open(PO,">po.st"); PO->autoflush(1); open(SI,">si.st"); SI->autoflush(1); open(SO,">so.st"); SO->autoflush(1); open(CX,">ctx.st"); CX->autoflush(1); open(MF,">free.st");MF->autoflush(1); open(BF,">buf.st"); BF->autoflush(1); open(AC,">act.st"); AC->autoflush(1); open(ID,">id.st"); ID->autoflush(1); open(IC,">ic.st"); IC->autoflush(1); open(IT,">it.st"); IT->autoflush(1); open(SW,">swap.st");SW->autoflush(1); open(BH,">bh.st"); BH->autoflush(1); open(IN,">inode.st"); IN->autoflush(1); open(DE,">dentry.st"); DE->autoflush(1); $c = 0; $first = gettimeofday(); while (1) { sleep(1); $now = gettimeofday() - $first; seek(S,0,SEEK_SET); while () { if (/^page\s+(\d+)\s+(\d+)$/) { if ($c) { print PI "$now ",4*($1 - $pi),"\n"; } if ($c) { print PO "$now ",4*($2 - $po),"\n"; } $pi = $1; $po = $2; next; } if (/^swap\s+(\d+)\s+(\d+)$/) { if ($c) { print SI "$now ",4*($1 - $si),"\n"; } if ($c) { print SO "$now ",4*($2 - $so),"\n"; } $si = $1; $so = $2; next; } if (/^ctxt\s+(\d+)$/) { if ($c) { print CX "$now ",$1 - $cx,"\n"; } $cx = $1; next; } } seek(M,0,SEEK_SET); while () { if (/^MemFree:\s+(\d+) kB$/) {print MF "$now ",$1,"\n"; next; } if (/^Buffers:\s+(\d+) kB$/) {print BF "$now ",$1,"\n"; next; } if (/^Active:\s+(\d+) kB$/) { print AC "$now ",$1,"\n"; next; } if (/^Inact_dirty:\s+(\d+) kB$/) {print ID "$now ",$1,"\n"; next; } if (/^Inact_clean:\s+(\d+) kB$/) {print IC "$now ",$1,"\n"; next; } if (/^Inact_target:\s+(\d+) kB$/) { print IT "$now ",$1,"\n"; next; } if (/^Inact_target:\s+(\d+) kB$/) { print IT "$now ",$1,"\n"; next; } if (/^Swap:\s+\d+\s+(\d+)/) { print SW "$now ",$1,"\n"; next; } } seek(B,0,SEEK_SET); while () { if (/^buffer_head\s+(\d+)\s+(\d+)\s+(\d+)/) { print BH "$now ",$1*$3/1024,"\n"; next; } if (/^inode_cache\s+(\d+)\s+(\d+)\s+(\d+)/) { print IN "$now ",$1*$3/1024,"\n"; next; } if (/^dentry_cache\s+(\d+)\s+(\d+)\s+(\d+)/) {print DE "$now ",$1*$3/1024,"\n"; next; } } $c++; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: XMM: monitor Linux MM inactive/active lists graphically
XMM is heavily modified XMEM utility that shows graphically size of different Linux page lists: active, inactive_dirty, inactive_clean, code, free and swap usage. It is better suited for the monitoring of Linux 2.4 MM implementation than original (XMEM) utility. Find it here: URL:http://linux.inet.hr/ interesting. I prefer to collect data separately from viewing it, and use the following simple perl script to do so; obviously, it generates a bunch of separate files, one for each metric, suitable for traditional filtering, gnuplot, etc. #!/bin/perl use IO::Handle; require 'sys/syscall.ph'; sub gettimeofday { $timeval = pack(LL, ()); syscall( SYS_gettimeofday, $timeval, 0) != -1 or die gettimeofday: $!; ($sec,$usec) = unpack(LL, $timeval); return $sec + 1e-6 * $usec; } open(S,/proc/stat) || die(failed to open /proc/stat); open(M,/proc/meminfo) || die(failed to open /proc/meminfo); open(B,/proc/slabinfo) || die(failed to open /proc/slabinfo); open(PI,pi.st); PI-autoflush(1); open(PO,po.st); PO-autoflush(1); open(SI,si.st); SI-autoflush(1); open(SO,so.st); SO-autoflush(1); open(CX,ctx.st); CX-autoflush(1); open(MF,free.st);MF-autoflush(1); open(BF,buf.st); BF-autoflush(1); open(AC,act.st); AC-autoflush(1); open(ID,id.st); ID-autoflush(1); open(IC,ic.st); IC-autoflush(1); open(IT,it.st); IT-autoflush(1); open(SW,swap.st);SW-autoflush(1); open(BH,bh.st); BH-autoflush(1); open(IN,inode.st); IN-autoflush(1); open(DE,dentry.st); DE-autoflush(1); $c = 0; $first = gettimeofday(); while (1) { sleep(1); $now = gettimeofday() - $first; seek(S,0,SEEK_SET); while (S) { if (/^page\s+(\d+)\s+(\d+)$/) { if ($c) { print PI $now ,4*($1 - $pi),\n; } if ($c) { print PO $now ,4*($2 - $po),\n; } $pi = $1; $po = $2; next; } if (/^swap\s+(\d+)\s+(\d+)$/) { if ($c) { print SI $now ,4*($1 - $si),\n; } if ($c) { print SO $now ,4*($2 - $so),\n; } $si = $1; $so = $2; next; } if (/^ctxt\s+(\d+)$/) { if ($c) { print CX $now ,$1 - $cx,\n; } $cx = $1; next; } } seek(M,0,SEEK_SET); while (M) { if (/^MemFree:\s+(\d+) kB$/) {print MF $now ,$1,\n; next; } if (/^Buffers:\s+(\d+) kB$/) {print BF $now ,$1,\n; next; } if (/^Active:\s+(\d+) kB$/) { print AC $now ,$1,\n; next; } if (/^Inact_dirty:\s+(\d+) kB$/) {print ID $now ,$1,\n; next; } if (/^Inact_clean:\s+(\d+) kB$/) {print IC $now ,$1,\n; next; } if (/^Inact_target:\s+(\d+) kB$/) { print IT $now ,$1,\n; next; } if (/^Inact_target:\s+(\d+) kB$/) { print IT $now ,$1,\n; next; } if (/^Swap:\s+\d+\s+(\d+)/) { print SW $now ,$1,\n; next; } } seek(B,0,SEEK_SET); while (B) { if (/^buffer_head\s+(\d+)\s+(\d+)\s+(\d+)/) { print BH $now ,$1*$3/1024,\n; next; } if (/^inode_cache\s+(\d+)\s+(\d+)\s+(\d+)/) { print IN $now ,$1*$3/1024,\n; next; } if (/^dentry_cache\s+(\d+)\s+(\d+)\s+(\d+)/) {print DE $now ,$1*$3/1024,\n; next; } } $c++; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 freezes on VIA KT133
> > contrary to the implication here, I don't believe there is any *general* > > problem with Linux/VIA/AMD stability. there are well-known issues ... > VIA hardware is not suitable for anything until we _know_ the > truth about what is wrong. VIA is hiding something big. this is INCORRECT: we know there are specific problems with certain VIA hardware, but there is most definitely *NO* problem with other VIA hardware, which is eminently suitable for servers, workstations and cabbage dicing controllers. afaik, there are absolutely zero problems reported with kt133-no-a machines, for instance. mine has certainly worked flawlessly for a long time, on most every 2.3/2.4 kernel over the past year+. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 freezes on VIA KT133
contrary to the implication here, I don't believe there is any *general* problem with Linux/VIA/AMD stability. there are well-known issues ... VIA hardware is not suitable for anything until we _know_ the truth about what is wrong. VIA is hiding something big. this is INCORRECT: we know there are specific problems with certain VIA hardware, but there is most definitely *NO* problem with other VIA hardware, which is eminently suitable for servers, workstations and cabbage dicing controllers. afaik, there are absolutely zero problems reported with kt133-no-a machines, for instance. mine has certainly worked flawlessly for a long time, on most every 2.3/2.4 kernel over the past year+. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 freezes on VIA KT133
> This report is probably not very helpful, but it may be useful for those who > planned to purchase AMD / VIA solution for a server. contrary to the implication here, I don't believe there is any *general* problem with Linux/VIA/AMD stability. there are well-known issues with specific items (VIA 686b, for instance), but VIA/AMD hardware is quite suitable for servers. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 freezes on VIA KT133
This report is probably not very helpful, but it may be useful for those who planned to purchase AMD / VIA solution for a server. contrary to the implication here, I don't believe there is any *general* problem with Linux/VIA/AMD stability. there are well-known issues with specific items (VIA 686b, for instance), but VIA/AMD hardware is quite suitable for servers. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: FW: I think I've found a serious bug in AMD Athlon page_alloc.croutines, where do I mail the developer(s) ?
> I think I've found a serious bug in AMD Athlon page_alloc.c routines in there's nothing athlon-specific there. > correct on the DFI AK75-EC motherboard, if I set the CPU kernel type to 586 > everything is 100%, if I use "Athlon" kernel type I get: > kernel BUG at page_alloc.c:73 when you select athlon at compile time, you're mainly getting Arjan's athlon-specific page-clear and -copy functions (along with some relatively trivial alignment changes). these functions are ~3x as fast as the generic ones, and seem to cause dram/cpu-related oopes on some machines. in short: faster code pushes the hardware past stability. there's no reason, so far, to think that there's anything wrong with the code - Alan had a possible issue with prefetching and very old Atlons, but the people reporting problems like this are actually running kt133a and new fsb133 Athlons. > I've changed RAM, Motherboard etc... still the same. changed to a non-kt133a board? how about running fsb and/or dram at 100, rather than 133? > Also the same system runs linux-2.2.16 100% 2.2 doesn't have the fast page-clear and -copy code afaik. afaik, there are *no* problems on kt133 machines, and haven't heard any pain from people who might have Ali Magic1, AMD 760 or KT266 boards, but they're still rare. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: FW: I think I've found a serious bug in AMD Athlon page_alloc.croutines, where do I mail the developer(s) ?
I think I've found a serious bug in AMD Athlon page_alloc.c routines in there's nothing athlon-specific there. correct on the DFI AK75-EC motherboard, if I set the CPU kernel type to 586 everything is 100%, if I use Athlon kernel type I get: kernel BUG at page_alloc.c:73 when you select athlon at compile time, you're mainly getting Arjan's athlon-specific page-clear and -copy functions (along with some relatively trivial alignment changes). these functions are ~3x as fast as the generic ones, and seem to cause dram/cpu-related oopes on some machines. in short: faster code pushes the hardware past stability. there's no reason, so far, to think that there's anything wrong with the code - Alan had a possible issue with prefetching and very old Atlons, but the people reporting problems like this are actually running kt133a and new fsb133 Athlons. I've changed RAM, Motherboard etc... still the same. changed to a non-kt133a board? how about running fsb and/or dram at 100, rather than 133? Also the same system runs linux-2.2.16 100% 2.2 doesn't have the fast page-clear and -copy code afaik. afaik, there are *no* problems on kt133 machines, and haven't heard any pain from people who might have Ali Magic1, AMD 760 or KT266 boards, but they're still rare. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: IDE dma_intr error on VIA chipset
> hda: dma_intr: error=0x84 { DriveStatusError BadCRC } read the fine faq. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: IDE dma_intr error on VIA chipset
hda: dma_intr: error=0x84 { DriveStatusError BadCRC } read the fine faq. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LVM 1.0 release decision
On Fri, 11 May 2001, Jeff Garzik wrote: ... > Subsystems are often maintained outside the Linus tree, with code > getting pushed (hopefully regularly) to Linus. For such scenarios, it "maintained" *means* that the fixes/development get fed to Linus. afaikt, the LVM/ISDN/etc situations were problems because the developers merely hacked on code, and failed to do the maintenance (feed Linus) part. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LVM 1.0 release decision
On Fri, 11 May 2001, Jeff Garzik wrote: ... Subsystems are often maintained outside the Linus tree, with code getting pushed (hopefully regularly) to Linus. For such scenarios, it maintained *means* that the fixes/development get fed to Linus. afaikt, the LVM/ISDN/etc situations were problems because the developers merely hacked on code, and failed to do the maintenance (feed Linus) part. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] arp_filter patch for 2.4.4 kernel.
> > also -- isn't it kind of wrong for arp to respond with addresses from > > other interfaces? > > Usually it makes sense, because it increases your chances of successfull > communication. IP addresses are owned by the complete host on Linux, not by > different interfaces. this is one of those things that is still hurting Linux's credibility in the read world. people see this kind of obviously broken behavior, and install *BSD or Solaris instead. isn't this clearly a case of the kernel being too smart: making it impossible for a clueful admin to do what he needs? multi-nic machines are now quite common, but this "feature" makes them far less useful, since the stack is violating the admin's intention. > For some weirder setups (most of them just caused by incorrect routing > tables, but also a few legimitate ones; including incoming load balancing > via multipath routes) it causes problems, so arpfilter was invented to > sync ARP replies with the routing tables as needed. there's NOTHING weird about a machine having two nics and two IPs, wanting to behave like two hosts. is there any positive/beneficial reason for the current behavior? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] arp_filter patch for 2.4.4 kernel.
also -- isn't it kind of wrong for arp to respond with addresses from other interfaces? Usually it makes sense, because it increases your chances of successfull communication. IP addresses are owned by the complete host on Linux, not by different interfaces. this is one of those things that is still hurting Linux's credibility in the read world. people see this kind of obviously broken behavior, and install *BSD or Solaris instead. isn't this clearly a case of the kernel being too smart: making it impossible for a clueful admin to do what he needs? multi-nic machines are now quite common, but this feature makes them far less useful, since the stack is violating the admin's intention. For some weirder setups (most of them just caused by incorrect routing tables, but also a few legimitate ones; including incoming load balancing via multipath routes) it causes problems, so arpfilter was invented to sync ARP replies with the routing tables as needed. there's NOTHING weird about a machine having two nics and two IPs, wanting to behave like two hosts. is there any positive/beneficial reason for the current behavior? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Athlon and fast_page_copy: What's it worth ? :)
On Fri, 4 May 2001, Seth Goldberg wrote: > Hi, > > Before I go any further with this investigation, I'd like to get an > idea > of how much of a performance improvement the K7 fast_page_copy will give > me. > Can someone suggest the best benchmark to test the speed of this > routine? Arjan van de Ven did the code, and he wrote a little test harness. I've hacked it a bit (http://brain.mcmaster.ca/~hahn/athlon.c); on my duron/600, kt133, pc133 cas2, it looks like this: clear_page by 'normal_clear_page'took 7221 cycles (324.6 MB/s) clear_page by 'slow_zero_page' took 7232 cycles (324.1 MB/s) clear_page by 'fast_clear_page' took 6110 cycles (383.6 MB/s) clear_page by 'faster_clear_page'took 2574 cycles (910.6 MB/s) copy_page by 'normal_copy_page' took 7224 cycles (324.4 MB/s) copy_page by 'slow_copy_page'took 7223 cycles (324.5 MB/s) copy_page by 'fast_copy_page'took 4662 cycles (502.7 MB/s) copy_page by 'faster_copy' took 2746 cycles (853.5 MB/s) copy_page by 'even_faster' took 2802 cycles (836.5 MB/s) 70% faster! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Athlon and fast_page_copy: What's it worth ? :)
On Fri, 4 May 2001, Seth Goldberg wrote: Hi, Before I go any further with this investigation, I'd like to get an idea of how much of a performance improvement the K7 fast_page_copy will give me. Can someone suggest the best benchmark to test the speed of this routine? Arjan van de Ven did the code, and he wrote a little test harness. I've hacked it a bit (http://brain.mcmaster.ca/~hahn/athlon.c); on my duron/600, kt133, pc133 cas2, it looks like this: clear_page by 'normal_clear_page'took 7221 cycles (324.6 MB/s) clear_page by 'slow_zero_page' took 7232 cycles (324.1 MB/s) clear_page by 'fast_clear_page' took 6110 cycles (383.6 MB/s) clear_page by 'faster_clear_page'took 2574 cycles (910.6 MB/s) copy_page by 'normal_copy_page' took 7224 cycles (324.4 MB/s) copy_page by 'slow_copy_page'took 7223 cycles (324.5 MB/s) copy_page by 'fast_copy_page'took 4662 cycles (502.7 MB/s) copy_page by 'faster_copy' took 2746 cycles (853.5 MB/s) copy_page by 'even_faster' took 2802 cycles (836.5 MB/s) 70% faster! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: DISCOVERED! Cause of Athlon/VIA KX133 Instability
> > this has nothing to do with the very specific disk corruption > > being discussed (which has to do with the ide controller, according > > to the most recent rumors.). > > Actually, I think there are 2 problems that have been discussed -- the > disk corruption and a general instability resulting in oops'es at > various points shortly after boot up. I don't see this. specifically, there were scattered reports of a via-ide problem a few months ago; this is the issue that's gotten some press, and for which Alan has a fix. and there are reports of via-smp problems at boot (which go away with noapic). I see no reports of the kind of general instability you're talking about. and all the via-users I've heard of have no such stability problems - me included (kt133/duron). the only general issue is that kx133 systems seem to be difficult to configure for stability. ugly things like tweaking Vio. there's no implication that has anything to do with Linux, though. > > My memory system jas been set up very conservitavely and has been > rock solid in my other board (ka7), so I doubt it's that, but I > sure am happy to try a few more cominations of bios settings. Anything > I should look for in particular? how many dimms do you have? interleave settings? Vio jumper? already checked on cooling issues? and that you're not overclocking... > > why resort to silly windows tools, when lspci under Linux does it for you? > > Because lspci does not display all 256 bytes of pci configuration > information. sure it does: (from my kt133 hostbridge) [root@crema /root]# lspci -s 00:00.0 -xxx 00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02) 00: 06 11 05 03 06 00 10 22 02 00 00 06 00 00 00 00 10: 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 27 a4 0b b4 46 02 08 08 08 00 00 00 04 08 08 08 60: 0c 00 00 00 d5 d6 d5 00 50 5d 86 0d 08 01 00 00 70: c9 88 cc 0c 0e a0 d2 00 01 b4 01 02 00 00 00 00 80: 0f 40 00 00 f0 00 00 00 02 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 02 c0 20 00 07 02 00 1f 00 00 00 00 2b 02 04 00 b0: 7f 63 2a 65 31 33 c0 0c 00 00 00 00 00 00 00 00 c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 0e 22 00 00 00 00 00 00 00 [root@crema /root]# od -Ax -txC /proc/bus/pci/00/00.0 00 06 11 05 03 06 00 10 22 02 00 00 06 00 00 00 00 10 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50 27 a4 0b b4 46 02 08 08 08 00 00 00 04 08 08 08 60 0c 00 00 00 d5 d6 d5 00 50 5d 86 0d 08 01 00 00 70 c9 88 cc 0c 0e a0 d2 00 01 b4 01 02 00 00 00 00 80 0f 40 00 00 f0 00 00 00 02 00 00 00 00 00 00 00 90 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0 02 c0 20 00 07 02 00 1f 00 00 00 00 2b 02 04 00 b0 7f 63 2a 65 31 33 c0 0c 00 00 00 00 00 00 00 00 c0 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * f0 00 00 00 00 00 00 00 0e 22 00 00 00 00 00 00 00 000100 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: DISCOVERED! Cause of Athlon/VIA KX133 Instability
> And that's exactly what I did :)... I found that ONLY the combination > of USE_3DNOW and forcing the athlon mmx stuff in (by doing #if 1 in > results in this wackiness. I should alos repeat that I *DO* see that I doubt that USE_3DNOW is causing the problem, but rather when you USE_3DNOW, the kernel streams through your northbridge at roughly twice the bandwidth. if your dram settings are flakey, this could eaily trigger a problem. this has nothing to do with the very specific disk corruption being discussed (which has to do with the ide controller, according to the most recent rumors.). > The other thing i was gunna try is to dump my chipset registers using > WPCREDIT and WPCRSET and compare them with other people on this list why resort to silly windows tools, when lspci under Linux does it for you? regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: Linux 2.4.4-ac2
> + * Make sure the child gets the SCHED_YIELD flag cleared, even if > + * it inherited it, to avoid deadlocks. can anyone think of a reason that SCHED_YIELD *should* be inherited? I think it's just oversight that fork doesn't clear it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: Linux 2.4.4-ac2
+ * Make sure the child gets the SCHED_YIELD flag cleared, even if + * it inherited it, to avoid deadlocks. can anyone think of a reason that SCHED_YIELD *should* be inherited? I think it's just oversight that fork doesn't clear it. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: DISCOVERED! Cause of Athlon/VIA KX133 Instability
And that's exactly what I did :)... I found that ONLY the combination of USE_3DNOW and forcing the athlon mmx stuff in (by doing #if 1 in results in this wackiness. I should alos repeat that I *DO* see that I doubt that USE_3DNOW is causing the problem, but rather when you USE_3DNOW, the kernel streams through your northbridge at roughly twice the bandwidth. if your dram settings are flakey, this could eaily trigger a problem. this has nothing to do with the very specific disk corruption being discussed (which has to do with the ide controller, according to the most recent rumors.). The other thing i was gunna try is to dump my chipset registers using WPCREDIT and WPCRSET and compare them with other people on this list why resort to silly windows tools, when lspci under Linux does it for you? regards, mark hahn. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: DISCOVERED! Cause of Athlon/VIA KX133 Instability
this has nothing to do with the very specific disk corruption being discussed (which has to do with the ide controller, according to the most recent rumors.). Actually, I think there are 2 problems that have been discussed -- the disk corruption and a general instability resulting in oops'es at various points shortly after boot up. I don't see this. specifically, there were scattered reports of a via-ide problem a few months ago; this is the issue that's gotten some press, and for which Alan has a fix. and there are reports of via-smp problems at boot (which go away with noapic). I see no reports of the kind of general instability you're talking about. and all the via-users I've heard of have no such stability problems - me included (kt133/duron). the only general issue is that kx133 systems seem to be difficult to configure for stability. ugly things like tweaking Vio. there's no implication that has anything to do with Linux, though. My memory system jas been set up very conservitavely and has been rock solid in my other board (ka7), so I doubt it's that, but I sure am happy to try a few more cominations of bios settings. Anything I should look for in particular? how many dimms do you have? interleave settings? Vio jumper? already checked on cooling issues? and that you're not overclocking... why resort to silly windows tools, when lspci under Linux does it for you? Because lspci does not display all 256 bytes of pci configuration information. sure it does: (from my kt133 hostbridge) [root@crema /root]# lspci -s 00:00.0 -xxx 00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02) 00: 06 11 05 03 06 00 10 22 02 00 00 06 00 00 00 00 10: 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 27 a4 0b b4 46 02 08 08 08 00 00 00 04 08 08 08 60: 0c 00 00 00 d5 d6 d5 00 50 5d 86 0d 08 01 00 00 70: c9 88 cc 0c 0e a0 d2 00 01 b4 01 02 00 00 00 00 80: 0f 40 00 00 f0 00 00 00 02 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 02 c0 20 00 07 02 00 1f 00 00 00 00 2b 02 04 00 b0: 7f 63 2a 65 31 33 c0 0c 00 00 00 00 00 00 00 00 c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 0e 22 00 00 00 00 00 00 00 [root@crema /root]# od -Ax -txC /proc/bus/pci/00/00.0 00 06 11 05 03 06 00 10 22 02 00 00 06 00 00 00 00 10 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50 27 a4 0b b4 46 02 08 08 08 00 00 00 04 08 08 08 60 0c 00 00 00 d5 d6 d5 00 50 5d 86 0d 08 01 00 00 70 c9 88 cc 0c 0e a0 d2 00 01 b4 01 02 00 00 00 00 80 0f 40 00 00 f0 00 00 00 02 00 00 00 00 00 00 00 90 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0 02 c0 20 00 07 02 00 1f 00 00 00 00 2b 02 04 00 b0 7f 63 2a 65 31 33 c0 0c 00 00 00 00 00 00 00 00 c0 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * f0 00 00 00 00 00 00 00 0e 22 00 00 00 00 00 00 00 000100 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: #define HZ 1024 -- negative effects
> > Are there any negative effects of editing include/asm/param.h to change > > HZ from 100 to 1024? Or any other number? This has been suggested as a > > way to improve the responsiveness of the GUI on a Linux system. Does it ... > Why not just run the X server at a realtime priority? Then it will get > to respond to existing events, such as keyboard and mouse input, > promptly without creating lots of superfluous extra clock interrupts. > I think you will find this is a better solution. it's surprisingly ineffective; usually, if someone thinks responsiveness is bad, there's a problem with the system. for instance, if the system is swapping, setting X (and wm, and clients) to RT makes little difference, since the kernel is stealing pages from them, regardless of their scheduling priority. if you're curious, you might be interested in two toy programs I've attached. one is "setrealtime", which will make a pid RT, or else act as a wrapper (ala /bin/time). I have it installed suid root on my system, though this is rather dangerous if your have lusers around. the second is a simple memory-hog: mmaps a bunch of ram, and keeps it active (printing out a handy measure of how long it took to touch its pages...) regards, mark hahn. #include #include #include #include #include volatile unsigned sink; double second() { struct timeval tv; gettimeofday(,0); return tv.tv_sec + 1e-6 * tv.tv_usec; } int main(int argc, char *argv[]) { int doWrite = 1; unsigned size = 80 * 1024 * 1024; int letter; while ((letter = getopt(argc, argv, "s:wrvh?" )) != -1) { switch(letter) { case 's': size = atoi(optarg) * 1024 * 1024; break; case 'w': doWrite = 1; break; default: fprintf(stderr,"useup [-s mb][-w]\n"); exit(1); } } int *base = (int*) mmap(0, size, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, 0, 0); if (base == MAP_FAILED) { perror("mmap failed"); exit(1); } int *end = base + size/4; while (1) { double start = second(); if (doWrite) for (int *p = base; p < end; p += 1024) *p = 0; else { unsigned sum = 0; for (int *p = base; p < end; p += 1024) sum += *p; sink = sum; } printf("%f\n",1000*(second() - start)); } } #include #include #include #include int main(int argc, char *argv[]) { int uid = getuid(); int pid = atoi(argv[1]); int sched_fifo_min, sched_fifo_max; static struct sched_param sched_parms; if (!pid) pid = getpid(); sched_fifo_min = sched_get_priority_min(SCHED_FIFO); sched_fifo_max = sched_get_priority_max(SCHED_FIFO); sched_parms.sched_priority = sched_fifo_min + 1; if (sched_setscheduler(pid, SCHED_FIFO, _parms) == -1) perror("cannot set realtime scheduling policy"); if (uid) setuid(uid); if (pid == getpid()) execvp(argv[1],[1]); return 0; }
Re: /proc format (was Device Registry (DevReg) Patch 0.2.0)
> > Question: it is possible to redirect the same fs call (say read) to different > > implementations, based on the open mode of the file descriptor ? So, if > > you open the entry in binary, you just get the number chunk, if you open > > it in ascii you get a pretty printed version, or a format description like > > There is no distinction between "text" and "binary" modes on a file > descriptor. The distinction exists in the C stdio layer, but is a > no-op on Unix systems. of course. but we could trivially define O_PROC_BINARY, or an ioctl/fcntl, or even do something fancy like use lseek(). pardon my stream of consciousness here, but: I think it's well-established that proc exists for humans, and that there's no real sympathy for the eternal whines of how terribly hard it is to parse. it's NOT hard to parse, but would be more trivial if it were more consistent. the main goal at this point is to make kernel proc-related code more efficient, easy-to-use, etc. a purely secondary goal is to make user-space tools more robust, efficient, and simpler. there are three things that need to be communicated through the proc interface, for each chunk of data: its type, it's name and its value. it's critical that data be tagged in some way, since that's the only way to permit back-compatibility. that is, a tool looking for a particular tag will naturally ignore new data with other tags. /proc/sys is an attempt to provide tagged data; it works well, is easy to comprehend, but requires an open for each datum, and provides no hints about type. /proc/cpuinfo is another attempt: "tag : data", with no attempt to provide types. the tags have also mutated somewhat over time. /proc/partitions is an example of a record-oriented file: one line per record, and tags for the record members at the top. still no typing information. I have a sense that all of these could be collapsed into a single api where kernel systems would register hierarchies of tuples of , where callback would be passed the tag, and proc code would take care of "rendering" the data into human readable text (default), binary, or even xml. the latter would require some signalling mechanism like O_PROC_XML or the like. further, programs could perform a meta-query, where they ask for the types and tags of a datum (or hierarchy), so that on subsequent queries, they'd now how to handle binary data. if only one piece of code handled the rendering of /proc stuff, it could do more, without burdoning all the disparate /proc producers. regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: #define HZ 1024 -- negative effects
Are there any negative effects of editing include/asm/param.h to change HZ from 100 to 1024? Or any other number? This has been suggested as a way to improve the responsiveness of the GUI on a Linux system. Does it ... Why not just run the X server at a realtime priority? Then it will get to respond to existing events, such as keyboard and mouse input, promptly without creating lots of superfluous extra clock interrupts. I think you will find this is a better solution. it's surprisingly ineffective; usually, if someone thinks responsiveness is bad, there's a problem with the system. for instance, if the system is swapping, setting X (and wm, and clients) to RT makes little difference, since the kernel is stealing pages from them, regardless of their scheduling priority. if you're curious, you might be interested in two toy programs I've attached. one is setrealtime, which will make a pid RT, or else act as a wrapper (ala /bin/time). I have it installed suid root on my system, though this is rather dangerous if your have lusers around. the second is a simple memory-hog: mmaps a bunch of ram, and keeps it active (printing out a handy measure of how long it took to touch its pages...) regards, mark hahn. #include unistd.h #include stdlib.h #include stdio.h #include sys/time.h #include sys/mman.h volatile unsigned sink; double second() { struct timeval tv; gettimeofday(tv,0); return tv.tv_sec + 1e-6 * tv.tv_usec; } int main(int argc, char *argv[]) { int doWrite = 1; unsigned size = 80 * 1024 * 1024; int letter; while ((letter = getopt(argc, argv, s:wrvh? )) != -1) { switch(letter) { case 's': size = atoi(optarg) * 1024 * 1024; break; case 'w': doWrite = 1; break; default: fprintf(stderr,useup [-s mb][-w]\n); exit(1); } } int *base = (int*) mmap(0, size, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, 0, 0); if (base == MAP_FAILED) { perror(mmap failed); exit(1); } int *end = base + size/4; while (1) { double start = second(); if (doWrite) for (int *p = base; p end; p += 1024) *p = 0; else { unsigned sum = 0; for (int *p = base; p end; p += 1024) sum += *p; sink = sum; } printf(%f\n,1000*(second() - start)); } } #include unistd.h #include stdlib.h #include stdio.h #include sched.h int main(int argc, char *argv[]) { int uid = getuid(); int pid = atoi(argv[1]); int sched_fifo_min, sched_fifo_max; static struct sched_param sched_parms; if (!pid) pid = getpid(); sched_fifo_min = sched_get_priority_min(SCHED_FIFO); sched_fifo_max = sched_get_priority_max(SCHED_FIFO); sched_parms.sched_priority = sched_fifo_min + 1; if (sched_setscheduler(pid, SCHED_FIFO, sched_parms) == -1) perror(cannot set realtime scheduling policy); if (uid) setuid(uid); if (pid == getpid()) execvp(argv[1],argv[1]); return 0; }
Re: SMP in 2.4
Dennis is like a pie in the face: messy, unexpected, but trivial. On Wed, 18 Apr 2001, Dennis wrote: > Does 2.4 have something similar to spl levels or does it still require the > ridiculous MS-DOSish spin-locks to protect every bit of code? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMP in 2.4
Dennis is like a pie in the face: messy, unexpected, but trivial. On Wed, 18 Apr 2001, Dennis wrote: Does 2.4 have something similar to spl levels or does it still require the ridiculous MS-DOSish spin-locks to protect every bit of code? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Scalable FD Management using Read-Copy-Update
> > isn't this a solution in search of a problem? > > does it make sense to redesign parts of the kernel for the sole > > purpose of making a completely unrealistic benchmark run faster? > > Irrespective of the usefulness of the "chat" benchmark, it seems > that there is a problem of scalability as long as CLONE_FILES is > supported. John Hawkes (SGI) posted some nasty numbers on a > 32 CPU mips machine in the lse-tech list some time ago. that's not the point. the point is that this has every sign of being premature optimization. the "chat" benchmark does no work, it only generates load. and yes, indeed, you can cause contention if you apply enough load in the right places. this does NOT indicate that any real apps apply the same load in the same places. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Scalable FD Management using Read-Copy-Update
isn't this a solution in search of a problem? does it make sense to redesign parts of the kernel for the sole purpose of making a completely unrealistic benchmark run faster? Irrespective of the usefulness of the "chat" benchmark, it seems that there is a problem of scalability as long as CLONE_FILES is supported. John Hawkes (SGI) posted some nasty numbers on a 32 CPU mips machine in the lse-tech list some time ago. that's not the point. the point is that this has every sign of being premature optimization. the "chat" benchmark does no work, it only generates load. and yes, indeed, you can cause contention if you apply enough load in the right places. this does NOT indicate that any real apps apply the same load in the same places. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Scalable FD Management using Read-Copy-Update
> The improvement in performance while runnig "chat" benchmark > (from http://lbs.sourceforge.net/) is about 30% in average throughput. isn't this a solution in search of a problem? does it make sense to redesign parts of the kernel for the sole purpose of making a completely unrealistic benchmark run faster? (the chat "benchmark" is a simple pingpong load-generator; it is not in the same category as, say, specweb, since it does not do *any* realistic (nonlocal) IO. the numbers "chat" returns are interesting, but not indicative of any problem; perhaps even less than lmbench components.) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Scalable FD Management using Read-Copy-Update
The improvement in performance while runnig "chat" benchmark (from http://lbs.sourceforge.net/) is about 30% in average throughput. isn't this a solution in search of a problem? does it make sense to redesign parts of the kernel for the sole purpose of making a completely unrealistic benchmark run faster? (the chat "benchmark" is a simple pingpong load-generator; it is not in the same category as, say, specweb, since it does not do *any* realistic (nonlocal) IO. the numbers "chat" returns are interesting, but not indicative of any problem; perhaps even less than lmbench components.) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: memory allocation problems
> > note, though, that you *CAN* actually malloc a lot more than 1G: you > > just have to avoid causing mmaps that chop your VM at > > TASK_UNMAPPED_BASE: > > Neat trick. I didn't realize that you could avoid allocating the mmap() > buffers for stdin and stdout. noone ever said you had to use stdio. or even use libc, for that matter! > As was pointed out to me in January, another solution for i386 would be to > fix a maximum stack size and have the mmap() allocations grow downward > from the "top" of the stack (3GB - max stack size). I'm not sure why that > is not currently done. problems get fixed when there's some pain involved: people bumping into a limit, or painfully bad code, etc. not enough people are feeling any pain about the current design. this (and the "move TASK_UNMAPPED_BASE" workaround) have been known for years; I think someone even coded up a "grow vmareas down" patch the last time we all discussed this. > I once wrote a tiny patch to do this, and ran it successfully for a couple > days, but knowing so little about the kernel I probably did it in a > completely wrong, inefficient way. For example, some of the vma > structures are sorted in increasing address order, and so perhaps to do > this properly one should change them to decreasing address order. oh, I guess you did the patch ;) seriously, resubmit it when 2.5 opens up. the fact is that we currently have two things that grow up, and one that grows down. so obviously, one up-grower must have an arbitrary limit. switching vma's to down-growing is a good solution, since it's actually *good* to limit stack growth. I wonder whether fortraners still put all their data on the stack; they wouldn't be happy ;) a simple workaround would be to turn TASK_UNMAPPED_AREA into a variable, either system-wide or thread-specific (like ia64 already has!). that's compatible with the improved vmas-down approach, too. regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: memory allocation problems
> can get at most 2GB. Newer glibc's allow you to tune the definition > of "small" via an environment variable. eventually, perhaps libc will be smart enough to create more arenas in mmaped space once sbrk fails. note, though, that you *CAN* actually malloc a lot more than 1G: you just have to avoid causing mmaps that chop your VM at TASK_UNMAPPED_BASE: #include #include #include void printnumber(unsigned n) { char number[20]; int i; for (i=sizeof(number)-1; i>=0 && n; i--) { number[i] = '0' + (n % 10); n /= 10; } i++; write(1,number+i, sizeof(number)-i); } int main() { unsigned total = 0; const unsigned size = 32*1024; while (malloc(size)) { total += size; printnumber(total>>20); write(1,"\n",1); } return 0; } compile -static, of course; printnumber is to avoid stdio, which seems to use mmap for a small scratch buffer. I allocated 2942 MB on my 128M machine(had to add a swapfile temporarily, since so many tiny mallocs do touch nontrivial numbers of pages for arena bookkeeping.) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: memory allocation problems
can get at most 2GB. Newer glibc's allow you to tune the definition of "small" via an environment variable. eventually, perhaps libc will be smart enough to create more arenas in mmaped space once sbrk fails. note, though, that you *CAN* actually malloc a lot more than 1G: you just have to avoid causing mmaps that chop your VM at TASK_UNMAPPED_BASE: #include unistd.h #include stdlib.h #include fcntl.h void printnumber(unsigned n) { char number[20]; int i; for (i=sizeof(number)-1; i=0 n; i--) { number[i] = '0' + (n % 10); n /= 10; } i++; write(1,number+i, sizeof(number)-i); } int main() { unsigned total = 0; const unsigned size = 32*1024; while (malloc(size)) { total += size; printnumber(total20); write(1,"\n",1); } return 0; } compile -static, of course; printnumber is to avoid stdio, which seems to use mmap for a small scratch buffer. I allocated 2942 MB on my 128M machine(had to add a swapfile temporarily, since so many tiny mallocs do touch nontrivial numbers of pages for arena bookkeeping.) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: memory allocation problems
note, though, that you *CAN* actually malloc a lot more than 1G: you just have to avoid causing mmaps that chop your VM at TASK_UNMAPPED_BASE: Neat trick. I didn't realize that you could avoid allocating the mmap() buffers for stdin and stdout. noone ever said you had to use stdio. or even use libc, for that matter! As was pointed out to me in January, another solution for i386 would be to fix a maximum stack size and have the mmap() allocations grow downward from the "top" of the stack (3GB - max stack size). I'm not sure why that is not currently done. problems get fixed when there's some pain involved: people bumping into a limit, or painfully bad code, etc. not enough people are feeling any pain about the current design. this (and the "move TASK_UNMAPPED_BASE" workaround) have been known for years; I think someone even coded up a "grow vmareas down" patch the last time we all discussed this. I once wrote a tiny patch to do this, and ran it successfully for a couple days, but knowing so little about the kernel I probably did it in a completely wrong, inefficient way. For example, some of the vma structures are sorted in increasing address order, and so perhaps to do this properly one should change them to decreasing address order. oh, I guess you did the patch ;) seriously, resubmit it when 2.5 opens up. the fact is that we currently have two things that grow up, and one that grows down. so obviously, one up-grower must have an arbitrary limit. switching vma's to down-growing is a good solution, since it's actually *good* to limit stack growth. I wonder whether fortraners still put all their data on the stack; they wouldn't be happy ;) a simple workaround would be to turn TASK_UNMAPPED_AREA into a variable, either system-wide or thread-specific (like ia64 already has!). that's compatible with the improved vmas-down approach, too. regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bugreport: Kernel 2.4.x crash
> 2. A Fileserver with an ABIT Hotrod 66 (htp366) controller will crash within > 5-60 minutes after boot with a 2.4.x kernel. 2.2.x works fine. No other no problem with ext2 on hpt366 here. > Gnu C 2.95.3 hmm. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bugreport: Kernel 2.4.x crash
2. A Fileserver with an ABIT Hotrod 66 (htp366) controller will crash within 5-60 minutes after boot with a 2.4.x kernel. 2.2.x works fine. No other no problem with ext2 on hpt366 here. Gnu C 2.95.3 hmm. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMP on assym. x86
> > > > handle the situation with 2 different CPUs (AMP = Assymmetric > > > > multiprocessing ;-) correctly. > > > > > > "correctly". Intel doesn't support this (mis)configuration: > > > especially with different steppings, not to mention models. > > I wouldn't call it misconfiguration, just because it's a bit more difficult > to handle. again, I *would* call it misconfiguration. intel says explicitly that they don't support mixing model/family parts. and they only test same-clock combinations (but do support mixed steppings.) just so people don't get the impression that random, different CPUs are a sure thing... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMP on assym. x86
handle the situation with 2 different CPUs (AMP = Assymmetric multiprocessing ;-) correctly. "correctly". Intel doesn't support this (mis)configuration: especially with different steppings, not to mention models. I wouldn't call it misconfiguration, just because it's a bit more difficult to handle. again, I *would* call it misconfiguration. intel says explicitly that they don't support mixing model/family parts. and they only test same-clock combinations (but do support mixed steppings.) just so people don't get the impression that random, different CPUs are a sure thing... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMP on assym. x86
> recently upgrading one of my two CPUs, I found kernel-2.4.2 to be unable to > handle the situation with 2 different CPUs (AMP = Assymmetric > multiprocessing ;-) correctly. "correctly". Intel doesn't support this (mis)configuration: especially with different steppings, not to mention models. Alan has, or is working on, a workaround to handle differing multipliers by turning off the use of RDTSC. this is the right approach to take in the kernel: disable features not shared by both processors, so correctly-configured machines are not penalized. and the kernel should LOUDLY WARN ABOUT this stuff on boot. regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMP on assym. x86
recently upgrading one of my two CPUs, I found kernel-2.4.2 to be unable to handle the situation with 2 different CPUs (AMP = Assymmetric multiprocessing ;-) correctly. "correctly". Intel doesn't support this (mis)configuration: especially with different steppings, not to mention models. Alan has, or is working on, a workaround to handle differing multipliers by turning off the use of RDTSC. this is the right approach to take in the kernel: disable features not shared by both processors, so correctly-configured machines are not penalized. and the kernel should LOUDLY WARN ABOUT this stuff on boot. regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: UDMA 100 / PIIX4 question
>Device BootStart EndBlocks Id System > /dev/hda1 * 1 932 7486258+ b Win95 FAT32 > /dev/hda2 933 3737 22531162+ 5 Extended > /dev/hda5 933 935 24066 83 Linux > /dev/hda6 936 952136521 82 Linux swap > /dev/hda7 953 3737 22370481 83 Linux > > > I also ran hdparm -tT /dev/hda1: > > Timing buffer-cache reads: 128 MB in 1.28 seconds =100.00 MB/sec > Timing buffered disk reads: 64 MB in 4.35 seconds = 14.71 MB/sec > > Which obviously gives much the same result as my usual hdparm -tT /dev/hda > > I then tried hdparm -tT /dev/hda7: > > Timing buffer-cache reads: 128 MB in 1.28 seconds =100.00 MB/sec > Timing buffered disk reads: 64 MB in 2.12 seconds = 30.19 MB/sec > > As you would expect, I get almost identical results with several repetitions. > > Does this solve the mystery ? no, it's quite odd. hdparm -t cannot be effected by the filesystem that lives in the partition, since hdparm is doing reads that don't go through the filesystem. hmm, I wonder if that's it: if you mount the FS that's in hda1, it might change the block driver configuration (changing the blocksize, for instance). that would effect hdparm, even though its reads don't go through the FS. prediction: if you comment out the hda1 line in fstab, and reboot, so that vfat never gets mounted on that partition, I predict that hdparm will show >30.19 MB/s on it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: UDMA 100 / PIIX4 question
Device BootStart EndBlocks Id System /dev/hda1 * 1 932 7486258+ b Win95 FAT32 /dev/hda2 933 3737 22531162+ 5 Extended /dev/hda5 933 935 24066 83 Linux /dev/hda6 936 952136521 82 Linux swap /dev/hda7 953 3737 22370481 83 Linux I also ran hdparm -tT /dev/hda1: Timing buffer-cache reads: 128 MB in 1.28 seconds =100.00 MB/sec Timing buffered disk reads: 64 MB in 4.35 seconds = 14.71 MB/sec Which obviously gives much the same result as my usual hdparm -tT /dev/hda I then tried hdparm -tT /dev/hda7: Timing buffer-cache reads: 128 MB in 1.28 seconds =100.00 MB/sec Timing buffered disk reads: 64 MB in 2.12 seconds = 30.19 MB/sec As you would expect, I get almost identical results with several repetitions. Does this solve the mystery ? no, it's quite odd. hdparm -t cannot be effected by the filesystem that lives in the partition, since hdparm is doing reads that don't go through the filesystem. hmm, I wonder if that's it: if you mount the FS that's in hda1, it might change the block driver configuration (changing the blocksize, for instance). that would effect hdparm, even though its reads don't go through the FS. prediction: if you comment out the hda1 line in fstab, and reboot, so that vfat never gets mounted on that partition, I predict that hdparm will show 30.19 MB/s on it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: UDMA 100 / PIIX4 question
> > > I have an IBM DTLA 307030 (ATA 100 / UDMA 5) on an 815e board (Asus CUSL2), >which has a PIIX4 controller. > > > ... > > > My problem is that (according to hdparm -t), I never get a better transfer rate >than approximately 15.8 Mb/sec > > > > 15MB/s for hdparm is about right. it's definitely not right: this disk sustains around 35 MB/s. > Yes, since hdparm -t measures *SUSTAINED* transfers... the actual "head rate" of >data reads from > disk surface. Only if you read *only* data that is alread in harddrive's cache will >you get a speed > close to the UDMA mode of the drive/controller. The cache is around 1Mbyte, so for >a split-second > re-read of some data nonsequitur: the controller and disk are both quite capable of sustaining 20-35 MB/s (depending on zone.) here's hdparm output for a disk of the same rpm and density as the DTLA's: Timing buffer-cache reads: 128 MB in 1.07 seconds =119.63 MB/sec Timing buffered disk reads: 64 MB in 2.02 seconds = 31.68 MB/sec (maxtor dm+45, hpt366 controller) regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: UDMA 100 / PIIX4 question
I have an IBM DTLA 307030 (ATA 100 / UDMA 5) on an 815e board (Asus CUSL2), which has a PIIX4 controller. ... My problem is that (according to hdparm -t), I never get a better transfer rate than approximately 15.8 Mb/sec 15MB/s for hdparm is about right. it's definitely not right: this disk sustains around 35 MB/s. Yes, since hdparm -t measures *SUSTAINED* transfers... the actual "head rate" of data reads from disk surface. Only if you read *only* data that is alread in harddrive's cache will you get a speed close to the UDMA mode of the drive/controller. The cache is around 1Mbyte, so for a split-second re-read of some data nonsequitur: the controller and disk are both quite capable of sustaining 20-35 MB/s (depending on zone.) here's hdparm output for a disk of the same rpm and density as the DTLA's: Timing buffer-cache reads: 128 MB in 1.07 seconds =119.63 MB/sec Timing buffered disk reads: 64 MB in 2.02 seconds = 31.68 MB/sec (maxtor dm+45, hpt366 controller) regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scsi vs ide performance on fsync's
> itself is a bad thing, particularly given the amount of CPU overhead that > IDE drives demand while attached to the controller (orders of magnitude > higher than a good SCSI controller) - the more overhead we can hand off to I know this is just a troll by a scsi-believer, but I'm biting anyway. on current machines and disks, ide costs a few % CPU, depending on which CPU, disk, kernel, the sustained bandwidth, etc. I've measured this using the now-trendy method of noticing how much the IO costs a separate, CPU-bound benchmark: load = 1 - (unloadedPerf / loadedPerf). my cheesy duron/600 desktop typically shows ~2% actual cost when running bonnie's block IO tests. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scsi vs ide performance on fsync's
itself is a bad thing, particularly given the amount of CPU overhead that IDE drives demand while attached to the controller (orders of magnitude higher than a good SCSI controller) - the more overhead we can hand off to I know this is just a troll by a scsi-believer, but I'm biting anyway. on current machines and disks, ide costs a few % CPU, depending on which CPU, disk, kernel, the sustained bandwidth, etc. I've measured this using the now-trendy method of noticing how much the IO costs a separate, CPU-bound benchmark: load = 1 - (unloadedPerf / loadedPerf). my cheesy duron/600 desktop typically shows ~2% actual cost when running bonnie's block IO tests. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.2ac8 lost char devices
> > > > > Well, somethig has broken in ac8, because I lost my PS/2 mouse and > > > > me too . > No luck. it seems to be the mdelay(2) added to pc_keyb.c in -ac6. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.2ac8 lost char devices
Well, somethig has broken in ac8, because I lost my PS/2 mouse and me too /aol. No luck. it seems to be the mdelay(2) added to pc_keyb.c in -ac6. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ide / usb problem
> the cable length in mind. Anybody out there know if there's a max cable > length for the ATA/100 spec?? 18", like *all* ide/ata cables. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ide / usb problem
the cable length in mind. Anybody out there know if there's a max cable length for the ATA/100 spec?? 18", like *all* ide/ata cables. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: doing RAID 0 with HPT370
> do know I get the feeling they don't care to support Linux in any way > shape or form. Feels like a pawn off job. afaik, there's no hardware raid support in the chip - it's just another dual-channel controller, with some raid0 (perhaps raid1) software in bios. I think Andre has said that he has hopes of getting docs on HPT's on-disk raid layout - but this is a software thing, and all it would give us is interoperability with that other OS. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: doing RAID 0 with HPT370
do know I get the feeling they don't care to support Linux in any way shape or form. Feels like a pawn off job. afaik, there's no hardware raid support in the chip - it's just another dual-channel controller, with some raid0 (perhaps raid1) software in bios. I think Andre has said that he has hopes of getting docs on HPT's on-disk raid layout - but this is a software thing, and all it would give us is interoperability with that other OS. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/REQ] Increase kmsg buffer from 16K to 32K, kernel/printk.c
> > Would it be possible to grow and shring that buffer on demand? > > Let's say we have a default size and let it grow to a maximum > > value. After some timeout, buffer size can be shrinked to > > default value if it's enough at that moment. Or something > > similar. > > And when you can't allocate memory for expanding the > printk() ringbuffer? Print a message? ;) ;) but seriously, we normally need a big printk buffer only because of boot messages. no reason I know we couldn't shrink it down to something quite modest (4k? plenty for a few oopses) after boot. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: VT82C686A corruption with 2.4.x
>From what I gather this chipset on 2.4.x is only stable if you cripple just about >everything that makes > it worth having (udma, 2nd ide channel etc etc) ?does it even work when all >that's done now or is > it fully functional? it seems to be fully functional for some or most people, with two, apparently, reporting major problems. my via (kt133) is flawless in 2.4.1 (a drive on each channel, udma enabled and in use) and has for all the 2.3's since I got it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/REQ] Increase kmsg buffer from 16K to 32K, kernel/printk.c
Would it be possible to grow and shring that buffer on demand? Let's say we have a default size and let it grow to a maximum value. After some timeout, buffer size can be shrinked to default value if it's enough at that moment. Or something similar. And when you can't allocate memory for expanding the printk() ringbuffer? Print a message? ;) ;) but seriously, we normally need a big printk buffer only because of boot messages. no reason I know we couldn't shrink it down to something quite modest (4k? plenty for a few oopses) after boot. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: *massive* slowdowns on 2.4.1-pre1[1|2]
> Kernel 2.4.1-pre11 and pre12 are both massively slower than 2.4.0 on the > same machine, compiled with the same options. The machine is a Athlon > 900 on a KT133 chipset. The slowdown is noticealbe in all areas... this is known: Linus decreed that, since two people reported disk corruption on VIA, any machine with a VIA southbridge must boot in stupid 1992 mode (PIO). (yes, it might be possible to boot with ide=autodma or something, but who would guess?) Linus: I hope you don't consider this a releasable state! VIA now owns 40% of the chipset market... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: More on the VIA KT133 chipset misbehaving in Linux
> I am not a guru, but AOpen AK73PRO which uses VIA KT133 does not > show any of these symptoms that you describe (I cannot be sure > about #3 since I run ntp). You may want to make your hardware my ga-7zm shows none of them either (I also run ntp, and the board has a perfectly normal drift history.) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: More on the VIA KT133 chipset misbehaving in Linux
I am not a guru, but AOpen AK73PRO which uses VIA KT133 does not show any of these symptoms that you describe (I cannot be sure about #3 since I run ntp). You may want to make your hardware my ga-7zm shows none of them either (I also run ntp, and the board has a perfectly normal drift history.) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: *massive* slowdowns on 2.4.1-pre1[1|2]
Kernel 2.4.1-pre11 and pre12 are both massively slower than 2.4.0 on the same machine, compiled with the same options. The machine is a Athlon 900 on a KT133 chipset. The slowdown is noticealbe in all areas... this is known: Linus decreed that, since two people reported disk corruption on VIA, any machine with a VIA southbridge must boot in stupid 1992 mode (PIO). (yes, it might be possible to boot with ide=autodma or something, but who would guess?) Linus: I hope you don't consider this a releasable state! VIA now owns 40% of the chipset market... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Post codes during runtime, possibly OT
> #ifdef SLOW_IO_BY_JUMPING > #define __SLOW_DOWN_IO "\njmp 1f\n1:\tjmp 1f\n1:" > #else > -#define __SLOW_DOWN_IO "\noutb %%al,$0x80" > +#define __SLOW_DOWN_IO "\noutb %%al,$0x19" this is nutty: why can't udelay be used here? empirical measurements in the thread show the delay is O(2us). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Post codes during runtime, possibly OT
#ifdef SLOW_IO_BY_JUMPING #define __SLOW_DOWN_IO "\njmp 1f\n1:\tjmp 1f\n1:" #else -#define __SLOW_DOWN_IO "\noutb %%al,$0x80" +#define __SLOW_DOWN_IO "\noutb %%al,$0x19" this is nutty: why can't udelay be used here? empirical measurements in the thread show the delay is O(2us). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: multi-queue scheduler update
> >microseconds/yield > > # threads 2.2.16-22 2.42.4-multi-queue > > - --- > > 16 18.7404.603 1.455 > > I remeber the O(1) scheduler from Davide Libenzi was beating the mainline O(N) isn't the normal case (as in "The Right Case to optimize") where there are close to zero runnable tasks? what realistic/sane scenarios have very large numbers of spinning threads? all server situations I can think of do not. not volanomark -loopback, surely! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: multi-queue scheduler update
microseconds/yield # threads 2.2.16-22 2.42.4-multi-queue - --- 16 18.7404.603 1.455 I remeber the O(1) scheduler from Davide Libenzi was beating the mainline O(N) isn't the normal case (as in "The Right Case to optimize") where there are close to zero runnable tasks? what realistic/sane scenarios have very large numbers of spinning threads? all server situations I can think of do not. not volanomark -loopback, surely! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1-pre1 breaks XFree 4.0.2 and "w"
> This way we are 100% consistent and we don't lose the "cpu_has" information. but /dev/cpu/*/{msr|cpuid} are "cpu has". - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: APIC ERRor on CPU0: 00(02) ...
> I have a Motherboard BP6 with two Celeron 500 (Not overclocked) and ... > APIC error on CPU1: 00(08) ... > What wrongs ? Abit designed the board wrong. there are things you can do to reduce the incidence of this error: upgrading the bios, better cooling, more powerful power supply, replacing an out-of-spec capacitor (if v1.1). jeez, it's almost like a 12-step program for recovering from BP6ing ;) > This message doesn 't appears in Kernel-2.2.17 only in Kernel-2.4 indeed: the error still happens in 2.2, but is simply not reported. note also that this message is a *warning* - an inter-apic message was corrupted, and automatically retried. regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: APIC ERRor on CPU0: 00(02) ...
I have a Motherboard BP6 with two Celeron 500 (Not overclocked) and ... APIC error on CPU1: 00(08) ... What wrongs ? Abit designed the board wrong. there are things you can do to reduce the incidence of this error: upgrading the bios, better cooling, more powerful power supply, replacing an out-of-spec capacitor (if v1.1). jeez, it's almost like a 12-step program for recovering from BP6ing ;) This message doesn 't appears in Kernel-2.2.17 only in Kernel-2.4 indeed: the error still happens in 2.2, but is simply not reported. note also that this message is a *warning* - an inter-apic message was corrupted, and automatically retried. regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1-pre1 breaks XFree 4.0.2 and w
This way we are 100% consistent and we don't lose the "cpu_has" information. but /dev/cpu/*/{msr|cpuid} are "cpu has". - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: IDE DMA problems on 2.4.0 with vt82c686a driver
> Since this looks like either a chipset, drive, or driver problem, I am no: the only entities involved with udma crc's are the drive, the controller (and the cable). the kernel is not involved in any way (except to configure udma, of course.) > occasionally (not often/constant, but sometimes) get CRC errors: > hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } > hda: dma_intr: error=0x84 { DriveStatusError BadCRC } nothing wrong here. occasional crc retries cause no performance impact. > After reading some archives in linux-kernel, I tried changing some > options. Then I changed out the 40 pin, 80 wire cable with a new one. great, since without the 80c cable, udma > 33 is illegal. is it safe to assume your cable is also 18" or less, with both ends plugged in (no stub)? you might be able to minimize CRC retries by changing where the cable runs. it's also conceivable that CRC errors would be caused by marginal power, bad trace layout on the motherboard, and definitely by overclocking (PCI other than 33 MHz). > My main concern that I havnt beem able to find an answer for on any > archives or documentation, Can this cause file system corruption in any way? abosolutely not. udma checksums each transfer. when checksums don't match, the *hardware* retries the transfer (and incidentally reports the event, which Linux obligingly passes on to you.) regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: IDE DMA problems on 2.4.0 with vt82c686a driver
Since this looks like either a chipset, drive, or driver problem, I am no: the only entities involved with udma crc's are the drive, the controller (and the cable). the kernel is not involved in any way (except to configure udma, of course.) occasionally (not often/constant, but sometimes) get CRC errors: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } hda: dma_intr: error=0x84 { DriveStatusError BadCRC } nothing wrong here. occasional crc retries cause no performance impact. After reading some archives in linux-kernel, I tried changing some options. Then I changed out the 40 pin, 80 wire cable with a new one. great, since without the 80c cable, udma 33 is illegal. is it safe to assume your cable is also 18" or less, with both ends plugged in (no stub)? you might be able to minimize CRC retries by changing where the cable runs. it's also conceivable that CRC errors would be caused by marginal power, bad trace layout on the motherboard, and definitely by overclocking (PCI other than 33 MHz). My main concern that I havnt beem able to find an answer for on any archives or documentation, Can this cause file system corruption in any way? abosolutely not. udma checksums each transfer. when checksums don't match, the *hardware* retries the transfer (and incidentally reports the event, which Linux obligingly passes on to you.) regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: Change of policy for future 2.2 driver submissions
> since Mark posted his views to the list, I figured I could safely post the > conversation I've been having with him in email which is universally considered rude, if not illegal. in any case, please don't respond to this thread, which is quite off-topic. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: Change of policy for future 2.2 driver submissions
since Mark posted his views to the list, I figured I could safely post the conversation I've been having with him in email which is universally considered rude, if not illegal. in any case, please don't respond to this thread, which is quite off-topic. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Change of policy for future 2.2 driver submissions
> I personaly do not trust the 2.4.x kernel entirely yet, and would prefer to ... > afraid that this may partialy criple 2.2 driver development. egads! how can there be "development" on a *stable* kernel line? maybe this is the time to reconsider terminology/policy: does "stable" mean "bugfixes only"? or does it mean "development kernel for conservatives"? me, I've run the "progressive" kernel line on production boxes since ~2.3.36. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Monitoring filesystems / blockdevice for errors
> currently, there is no way for an external application to monitor whether a > filesystem or underlaying block device has hit an error condition - internal > inconsistency, read or write error, whatever. > > Short of parsing syslog messages, which isn't particularly great. what's wrong with it? reinventing /proc/kmsg and klogd would be tre gross. > I don't have a real idea how this could be added, short of adding a field to > /proc/partitions (error count) or something similiar. for reporting errors, that might be OK, but it's not a particularly nice _notification_ mechanism... regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Monitoring filesystems / blockdevice for errors
currently, there is no way for an external application to monitor whether a filesystem or underlaying block device has hit an error condition - internal inconsistency, read or write error, whatever. Short of parsing syslog messages, which isn't particularly great. what's wrong with it? reinventing /proc/kmsg and klogd would be tre gross. I don't have a real idea how this could be added, short of adding a field to /proc/partitions (error count) or something similiar. for reporting errors, that might be OK, but it's not a particularly nice _notification_ mechanism... regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.18 release notes
> > - metrics -- L1 cacheline size is the important one: you align array ... > Anyone can give me some pointers on how this is done runtime ? (name of > the .c file is fine). kernel/sched.c:aligned_data. as mentioned elsewhere, the correct alignment is not necessarily L1 linesize. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/