Re: [PATCH] User chroot
On Wed, Jun 27, 2001 at 04:55:56PM -0400, Albert D. Cahalan wrote: > ln /dev/zero /tmp/zero > ln /dev/hda ~/hda > ln /dev/mem /var/tmp/README None of these (of course) work if you use mount options to restrict device nodes on those filesystems. Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.2.x series and mm
On Wed, Jun 27, 2001 at 05:27:11PM +0100, Alan Cox wrote: > > I'm fairly sure it is the file buffers as the apache is already > > reniced to 20, it is got max 50 processes and each of processes is > > limited to like 1.5mb of size via ulimit. > > nice wont help you, it controls scheduling priority. Similar a ulimit just > ensures that no apache process goes mad and eats lots of memory (good idea > but not helpful here). If your working set (and thats the bit the matters) > really is exceeding memory by a fair bit then > > a)Add more RAM - that is the real optimal approach > b)Make the processes smaller (eg switch to thttpd from www.acme.com) > c)Speed up the I/O throughput relative to CPU speed > - eg the 2.2 IDE UDMA patches It may also be worth considering d) Reduce the number of Apache processes so they fit nicely in RAM Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] User chroot
On Wed, Jun 27, 2001 at 04:55:56PM -0400, Albert D. Cahalan wrote: ln /dev/zero /tmp/zero ln /dev/hda ~/hda ln /dev/mem /var/tmp/README None of these (of course) work if you use mount options to restrict device nodes on those filesystems. Sean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.2.x series and mm
On Wed, Jun 27, 2001 at 05:27:11PM +0100, Alan Cox wrote: I'm fairly sure it is the file buffers as the apache is already reniced to 20, it is got max 50 processes and each of processes is limited to like 1.5mb of size via ulimit. nice wont help you, it controls scheduling priority. Similar a ulimit just ensures that no apache process goes mad and eats lots of memory (good idea but not helpful here). If your working set (and thats the bit the matters) really is exceeding memory by a fair bit then a)Add more RAM - that is the real optimal approach b)Make the processes smaller (eg switch to thttpd from www.acme.com) c)Speed up the I/O throughput relative to CPU speed - eg the 2.2 IDE UDMA patches It may also be worth considering d) Reduce the number of Apache processes so they fit nicely in RAM Sean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wed, Jun 06, 2001 at 10:57:57AM +0100, Dr S.M. Huen wrote: > On Wed, 6 Jun 2001, Sean Hunter wrote: > > > > > For large memory boxes, this is ridiculous. Should I have 8GB of swap? > > > > Do I understand you correctly? > ECC grade SDRAM for your 8GB server costs £335 per GB as 512MB sticks even > at today's silly prices (Crucial). Ultra160 SCSI costs £8.93/GB as 73GB > drives. > > It will cost you 19x as much to put the RAM in as to put the > developer's recommended amount of swap space to back up that RAM. The > developers gave their reasons for this design some time ago and if the > ONLY problem was that it required you to allocate more swap, why should > it be a priority item to fix it for those that refuse to do so? By all > means fix it urgently where it doesn't work when used as advised but > demanding priority to fixing a problem encountered when a user refuses to > use it in the manner specified seems very unreasonable. If you can afford > 4GB RAM, you certainly can afford 8GB swap. > This is completely bogus. I am not saying that I can't afford the swap. What I am saying is that it is completely broken to require this amount of swap given the boundaries of efficient use. This is only one of several things which make the 2.4 VM suck for large, small or medium machines at the moment. Until we have a working VM 2.4 can't possibly go into production on my site on these machines. A working VM would have several differences from what we have in my opinion, among which are: - It wouldn't require 8GB of swap on my large boxes - It wouldn't suffer from the "bounce buffer" bug on my large boxes - It wouldn't cause the disk drive on my laptop to be _constantly_ in use even when all I have done is spawned a shell session and have no large apps or daemons running. - It wouldn't kill things saying it was OOM unless it was OOM. Furthermore, I am not demanding anything, much less "priority fixing" for this bug. Its my personal opinion that this is the most critical bug in the 2.4 series, and if I had the time and skill, this is what I would be working on. Because I don't have the time and skill, I am perfectly happy to wait until those that do fix the problem. To say it isn't a problem because I can buy more disk is nonsense, and its that sort of thinking that leads to constant need to upgrade hardware in the proprietary OS world. Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wed, Jun 06, 2001 at 11:16:27AM +0200, Xavier Bestel wrote: > On 06 Jun 2001 09:54:31 +0100, Sean Hunter wrote: > > > This is what Linus recommended for 2.4 (swap = 2 * RAM), saying that > > > anything less won't do any good: 2.4 overallocates swap even if it > > > doesn't use it all. So in your case you just have enough swap to map > > > your RAM, and nothing to really swap your apps. > > > > > > > For large memory boxes, this is ridiculous. Should I have 8GB of swap? > > Life is tough. If guess if you have 4GB RAM, you'd be better having no > swap at all. Or, yes, at least 8GB. > Or just wait for this bug to be fixed. But be patient. This is just pure bollocks. Virtual memory is one of the killer features of unix. It would be a strange admission to say that our "advanced" 2.4 kernel is so advanced that now you can't use virtual memory at all on large machines. Needing 8GB of swap to prevent a box from committing suicide when it has a working set of less than 512M is crazy. I am waiting patiently for the bug to be fixed. However, it is a real embarrasment that we can't run this "stable" kernel in production yet because somethign as fundamental as this is so badly broken. Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wed, Jun 06, 2001 at 10:19:30AM +0200, Xavier Bestel wrote: > On 05 Jun 2001 23:19:08 -0400, Derek Glidden wrote: > > On Wed, Jun 06, 2001 at 12:16:30PM +1000, Andrew Morton wrote: > > > "Jeffrey W. Baker" wrote: > > > > > > > > Because the 2.4 VM is so broken, and > > > > because my machines are frequently deeply swapped, > > > > > > The swapoff algorithms in 2.2 and 2.4 are basically identical. > > > The problem *appears* worse in 2.4 because it uses lots > > > more swap. > > > > I disagree with the terminology you're using. It *is* worse in 2.4, > > period. If it only *appears* worse, then if I encounter a situation > > where a 2.2 box has utilized as much swap as a 2.4 box, I should see the > > same results. Yet this happens not to be the case. > > Did you try to put twice as much swap as you have RAM ? (e.g. add a 512M > swapfile to your box) > This is what Linus recommended for 2.4 (swap = 2 * RAM), saying that > anything less won't do any good: 2.4 overallocates swap even if it > doesn't use it all. So in your case you just have enough swap to map > your RAM, and nothing to really swap your apps. > For large memory boxes, this is ridiculous. Should I have 8GB of swap? Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Tue, Jun 05, 2001 at 09:42:26PM -0400, Russell Leighton wrote: > > I also need some 2.4 features and can't really goto 2.2. > I would have to agree that the VM is too broken for production...looking > forward to the work that (hopefully) will be in 2.4.6 to resolve these issues. > Boring to do a "me too", but "me too". We have four big production oracle servers that could use 2.4 . However, the test server we have put 2.4 on has no end of ridiculous VM and OOM problems. It seems bizarre that a 4GB machine with a working set _far_ lower than that should be dying from OOM and swapping itself to death, but that's life in 2.4 land. Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Tue, Jun 05, 2001 at 09:42:26PM -0400, Russell Leighton wrote: I also need some 2.4 features and can't really goto 2.2. I would have to agree that the VM is too broken for production...looking forward to the work that (hopefully) will be in 2.4.6 to resolve these issues. Boring to do a me too, but me too. We have four big production oracle servers that could use 2.4 . However, the test server we have put 2.4 on has no end of ridiculous VM and OOM problems. It seems bizarre that a 4GB machine with a working set _far_ lower than that should be dying from OOM and swapping itself to death, but that's life in 2.4 land. Sean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wed, Jun 06, 2001 at 10:19:30AM +0200, Xavier Bestel wrote: On 05 Jun 2001 23:19:08 -0400, Derek Glidden wrote: On Wed, Jun 06, 2001 at 12:16:30PM +1000, Andrew Morton wrote: Jeffrey W. Baker wrote: Because the 2.4 VM is so broken, and because my machines are frequently deeply swapped, The swapoff algorithms in 2.2 and 2.4 are basically identical. The problem *appears* worse in 2.4 because it uses lots more swap. I disagree with the terminology you're using. It *is* worse in 2.4, period. If it only *appears* worse, then if I encounter a situation where a 2.2 box has utilized as much swap as a 2.4 box, I should see the same results. Yet this happens not to be the case. Did you try to put twice as much swap as you have RAM ? (e.g. add a 512M swapfile to your box) This is what Linus recommended for 2.4 (swap = 2 * RAM), saying that anything less won't do any good: 2.4 overallocates swap even if it doesn't use it all. So in your case you just have enough swap to map your RAM, and nothing to really swap your apps. For large memory boxes, this is ridiculous. Should I have 8GB of swap? Sean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wed, Jun 06, 2001 at 11:16:27AM +0200, Xavier Bestel wrote: On 06 Jun 2001 09:54:31 +0100, Sean Hunter wrote: This is what Linus recommended for 2.4 (swap = 2 * RAM), saying that anything less won't do any good: 2.4 overallocates swap even if it doesn't use it all. So in your case you just have enough swap to map your RAM, and nothing to really swap your apps. For large memory boxes, this is ridiculous. Should I have 8GB of swap? Life is tough. If guess if you have 4GB RAM, you'd be better having no swap at all. Or, yes, at least 8GB. Or just wait for this bug to be fixed. But be patient. This is just pure bollocks. Virtual memory is one of the killer features of unix. It would be a strange admission to say that our advanced 2.4 kernel is so advanced that now you can't use virtual memory at all on large machines. Needing 8GB of swap to prevent a box from committing suicide when it has a working set of less than 512M is crazy. I am waiting patiently for the bug to be fixed. However, it is a real embarrasment that we can't run this stable kernel in production yet because somethign as fundamental as this is so badly broken. Sean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wed, Jun 06, 2001 at 10:57:57AM +0100, Dr S.M. Huen wrote: On Wed, 6 Jun 2001, Sean Hunter wrote: For large memory boxes, this is ridiculous. Should I have 8GB of swap? Do I understand you correctly? ECC grade SDRAM for your 8GB server costs £335 per GB as 512MB sticks even at today's silly prices (Crucial). Ultra160 SCSI costs £8.93/GB as 73GB drives. It will cost you 19x as much to put the RAM in as to put the developer's recommended amount of swap space to back up that RAM. The developers gave their reasons for this design some time ago and if the ONLY problem was that it required you to allocate more swap, why should it be a priority item to fix it for those that refuse to do so? By all means fix it urgently where it doesn't work when used as advised but demanding priority to fixing a problem encountered when a user refuses to use it in the manner specified seems very unreasonable. If you can afford 4GB RAM, you certainly can afford 8GB swap. This is completely bogus. I am not saying that I can't afford the swap. What I am saying is that it is completely broken to require this amount of swap given the boundaries of efficient use. This is only one of several things which make the 2.4 VM suck for large, small or medium machines at the moment. Until we have a working VM 2.4 can't possibly go into production on my site on these machines. A working VM would have several differences from what we have in my opinion, among which are: - It wouldn't require 8GB of swap on my large boxes - It wouldn't suffer from the bounce buffer bug on my large boxes - It wouldn't cause the disk drive on my laptop to be _constantly_ in use even when all I have done is spawned a shell session and have no large apps or daemons running. - It wouldn't kill things saying it was OOM unless it was OOM. Furthermore, I am not demanding anything, much less priority fixing for this bug. Its my personal opinion that this is the most critical bug in the 2.4 series, and if I had the time and skill, this is what I would be working on. Because I don't have the time and skill, I am perfectly happy to wait until those that do fix the problem. To say it isn't a problem because I can buy more disk is nonsense, and its that sort of thinking that leads to constant need to upgrade hardware in the proprietary OS world. Sean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux scalability?
Yup. The problem is that you're trying to measure scalability in performance of an i/o-bound task by comparing a machine with greater i/o resource but less processing power with one with greater processing but poorer i/o. Surprisingly enough, the one with the best i/o wins. This isn't really a fair comparison between the two platforms. If you put the same disk array on both machines and got the same results, then you'd have a point. My point was that in the real world having this configuration for a webserver is unlikely to be sensible at all. Sean On Sat, May 19, 2001 at 10:31:01AM +0200, Sasi Peter wrote: > On Fri, 18 May 2001, Sean Hunter wrote: > > > Why would you want to run a web server with 8 processors rather than four > > webservers with 2 each? > > As you might already know, after the interviews to Mingo I assumed, that a > major portion of the achievements was enabled by the 2.4 scalability > enhacements. That is why I wrote to LKML, to ask about the 2.4 > scalability, if anybody out there could tell us about the linux kernel's > scalability possibily compared to W2k scalability... > > -- > SaPE - Peter, Sasi - mailto:[EMAIL PROTECTED] - http://sape.iq.rulez.org/ > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux scalability?
Yup. The problem is that you're trying to measure scalability in performance of an i/o-bound task by comparing a machine with greater i/o resource but less processing power with one with greater processing but poorer i/o. Surprisingly enough, the one with the best i/o wins. This isn't really a fair comparison between the two platforms. If you put the same disk array on both machines and got the same results, then you'd have a point. My point was that in the real world having this configuration for a webserver is unlikely to be sensible at all. Sean On Sat, May 19, 2001 at 10:31:01AM +0200, Sasi Peter wrote: On Fri, 18 May 2001, Sean Hunter wrote: Why would you want to run a web server with 8 processors rather than four webservers with 2 each? As you might already know, after the interviews to Mingo I assumed, that a major portion of the achievements was enabled by the 2.4 scalability enhacements. That is why I wrote to LKML, to ask about the 2.4 scalability, if anybody out there could tell us about the linux kernel's scalability possibily compared to W2k scalability... -- SaPE - Peter, Sasi - mailto:[EMAIL PROTECTED] - http://sape.iq.rulez.org/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux scalability?
Why would you want to run a web server with 8 processors rather than four webservers with 2 each? Sean On Fri, May 18, 2001 at 09:24:48AM +0200, Sasi Peter wrote: > Hi! > > I am just writing an essay, an have mentioned TUX as a performance and > scalability linearity recort holder with TUX, referencing the specweb99 > website summary page: > > http://www.spec.org/osg/web99/results/web99.html > > However, taking a closer look, it turns out, that the above statement > holds true only for 1 and 2 processor machines. Scalability already > suffers at 4 processors, and at 8 processors, TUX 2.0 (7500) gets beaten > by IIS 5.0 (8001), and these were measured on the same kind of box! > > How come, TUX is s good at the lowend (1 and 2 CPUs), and scales this > bad? > > -- > SaPE - Peter, Sasi - mailto:[EMAIL PROTECTED] - http://sape.iq.rulez.org/ > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux scalability?
Why would you want to run a web server with 8 processors rather than four webservers with 2 each? Sean On Fri, May 18, 2001 at 09:24:48AM +0200, Sasi Peter wrote: Hi! I am just writing an essay, an have mentioned TUX as a performance and scalability linearity recort holder with TUX, referencing the specweb99 website summary page: http://www.spec.org/osg/web99/results/web99.html However, taking a closer look, it turns out, that the above statement holds true only for 1 and 2 processor machines. Scalability already suffers at 4 processors, and at 8 processors, TUX 2.0 (7500) gets beaten by IIS 5.0 (8001), and these were measured on the same kind of box! How come, TUX is s good at the lowend (1 and 2 CPUs), and scales this bad? -- SaPE - Peter, Sasi - mailto:[EMAIL PROTECTED] - http://sape.iq.rulez.org/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: just-in-time debugging?
My approach is something like the others. I developed a small wrapper to catch unaligned traps on alpha. What it does is run a program in gdb with some specified arguments (it also sets up so that the process gets a SIGBUS when it does an unaligned access, but that's probably not relevant here). Any case, its available by anonymous ftp at ftp://uncarved.com/unaligned.c in case you're interested... Sean On Sat, Apr 28, 2001 at 09:17:10PM +0100, Tony Hoyle wrote: > Is there a way (kernel or userspace... doesn't matter) that gdb/ddd > could be invoked when a program is about > to dump core, or perhaps on a certain signal (that the app could deliver > to itself when required). The latter case > is what I need right now, as I have to debug an app that breaks > seemingly randomly & I need to halt when > certain assertions fail. Core dumps aren't much use as you can't resume > them, otherwise I'd just force a segfault > or something. > > I had a look at the do_coredump stuff and it looks like it could be > altered to call gdb in the same way that > modprobe gets called by kmod... however I don't sufficiently know the > code to work out whether it'd work properly > or not. > > A patch to glibc would perhaps be better, but I know that code even > less! > > Something like responding to SIGTRAP would probably be ideal. > > Tony > > -- > > "Two weeks before due date, the programmers work 22 hour days cobbling an > application from... (apparently) one programmer bashing his face into the > keyboard." -- Dilbert > > [EMAIL PROTECTED]http://www.nothing-on.tv > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux and high volume web sites
Also make sure you aren't suffering database lock contention from Mysql. This causes very fast context switching on the database server, and is typically unable to do useful work even though its load avg is not high. "vmstat" is useful here. Sean On Sat, Apr 28, 2001 at 01:55:01PM -0700, Tim Moore wrote: > David Lang wrote: > > > > watch the resonate heartbeat and see if it is getting lost in the network > > traffic (the resonate logs will show missing heartbeat packets). think > > seriously of setting the resonate stuff to run at a higher priority so > > that it doesn't get behind. > > > > depending on how high your network traffic is seriously look at putting in > > a second nic and switch to move the NFS traffic off the network that has > > the internet traffic and hearbeat. > > > > I had the same problem with central dispatch a couple years ago when first > > implementing it. the exact details of the problem that I ran into should > > have been fixed by now (mostly having to do with large number of virtual > > IP addresses) but the symptoms were the same. > > In addition to the above make sure there's enough bandwidth to the filer > (eg- good switches, multiple ethernets). > > Consider moving to 2.2.19. Significant VM changes after 2.2.19pre3 which > could account for the freezes. > > rgds, > tim. > > > > I have a high volume web site under linux : > > > kernel is 2.2.17 > > > hardware is 5 bi-PIII 700Mhz / 512Mb, eepro100 > > > all server are diskless (nfs on an netapp filer) except for tmp and swap > > > > > > dispatch is done by the Resonate product > > > > > > web server is apache+php (something like 400 processes), database > > > backend is a mysql on the same hardware > > > > > > in high volume from time to time machines are "freezing" then after a > > > few seconds they "reappear" and response timne is > > > > > > > > > how can I investigate all these problems ? > > -- > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux and high volume web sites
Also make sure you aren't suffering database lock contention from Mysql. This causes very fast context switching on the database server, and is typically unable to do useful work even though its load avg is not high. vmstat is useful here. Sean On Sat, Apr 28, 2001 at 01:55:01PM -0700, Tim Moore wrote: David Lang wrote: watch the resonate heartbeat and see if it is getting lost in the network traffic (the resonate logs will show missing heartbeat packets). think seriously of setting the resonate stuff to run at a higher priority so that it doesn't get behind. depending on how high your network traffic is seriously look at putting in a second nic and switch to move the NFS traffic off the network that has the internet traffic and hearbeat. I had the same problem with central dispatch a couple years ago when first implementing it. the exact details of the problem that I ran into should have been fixed by now (mostly having to do with large number of virtual IP addresses) but the symptoms were the same. In addition to the above make sure there's enough bandwidth to the filer (eg- good switches, multiple ethernets). Consider moving to 2.2.19. Significant VM changes after 2.2.19pre3 which could account for the freezes. rgds, tim. I have a high volume web site under linux : kernel is 2.2.17 hardware is 5 bi-PIII 700Mhz / 512Mb, eepro100 all server are diskless (nfs on an netapp filer) except for tmp and swap dispatch is done by the Resonate product web server is apache+php (something like 400 processes), database backend is a mysql on the same hardware in high volume from time to time machines are freezing then after a few seconds they reappear and response timne is how can I investigate all these problems ? -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: just-in-time debugging?
My approach is something like the others. I developed a small wrapper to catch unaligned traps on alpha. What it does is run a program in gdb with some specified arguments (it also sets up so that the process gets a SIGBUS when it does an unaligned access, but that's probably not relevant here). Any case, its available by anonymous ftp at ftp://uncarved.com/unaligned.c in case you're interested... Sean On Sat, Apr 28, 2001 at 09:17:10PM +0100, Tony Hoyle wrote: Is there a way (kernel or userspace... doesn't matter) that gdb/ddd could be invoked when a program is about to dump core, or perhaps on a certain signal (that the app could deliver to itself when required). The latter case is what I need right now, as I have to debug an app that breaks seemingly randomly I need to halt when certain assertions fail. Core dumps aren't much use as you can't resume them, otherwise I'd just force a segfault or something. I had a look at the do_coredump stuff and it looks like it could be altered to call gdb in the same way that modprobe gets called by kmod... however I don't sufficiently know the code to work out whether it'd work properly or not. A patch to glibc would perhaps be better, but I know that code even less! Something like responding to SIGTRAP would probably be ideal. Tony -- Two weeks before due date, the programmers work 22 hour days cobbling an application from... (apparently) one programmer bashing his face into the keyboard. -- Dilbert [EMAIL PROTECTED]http://www.nothing-on.tv - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Single user linux
On Tue, Apr 24, 2001 at 07:44:17PM +0700, [EMAIL PROTECTED] wrote: > with multi-user concept, conceptually there should be an > administrator to create account, grant permission, etc. > no my sister doesn't want that. i bet there are billions of > people not willing to learn how to use a computer, they just > want to use it. So they buy Macs. <- This is not a joke or a criticism. My wife is a happy and contented ignorant mac user. [snippage] > so what the hell is transmeta doing with mobile linux (midori). > is it going to teach multi-user thing to tablet owners? > surely mortals expect midori to behave like their pc. lets say > on redhat, they have to login as root to access their files, > they don't even know what a root is! > > lets break unix mind for a while, and give everyone a chance > to use linux. > If you wanted to do this, the correct place would be to alter your pam config, but then again, if you knew the slightest thing about unix, you'd know that. Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pedantic code cleanup - am I wasting my time with this?
On Mon, Apr 23, 2001 at 05:26:27PM +0200, Jesper Juhl wrote: > All the above does is to remove the last comma from 3 enumeration lists. > I know that gcc has no problem with that, but to be strictly correct the > last entry should not have a trailing comma. > Sadly not. This isn't a gcc thing: ANSI says that trailing comma is ok (K Second edition, A8.7 - pg 218 &219 in my copy) Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM killer???
On Thu, Mar 29, 2001 at 01:01:54PM +0200, Guest section DW wrote: > [Never use planes where the company's engineers spend their > time designing algorithms for selecting which passenger > must be thrown out when the plane is overloaded.] This is (as far as I can see) a fantastically specious argument. A plane is designed to function in an entirely constrained mode of operation and in an entirely well-understood and circumscribed problem space, whereas a linux host is a general-purpose device which can be used for many different applications. The reason the aero engineers don't need to select a passanger to throw out when the plane is overloaded is simply that the plane operators do not allow the plane to become overloaded. If I put a 100 people in a trilander it may take off, but won't fly, and will probably crash. The plane's designers don't have to do anything about that- we do. Furthermore, why do you suppose an aeroplane has more than one altimeter, artifical horizon and compass? Do you think it's because they are unable to make one of each that is reliable? Or do you think its because they are concerned about what happens if one fails _however unlikely that is_. In fact, aeroplane engineers do design in ways of mitigating the effects of all kinds of failures, including lessening the impact of a crash (directly analogous to our OOM killer). For example, they provide means of jettisonning fuel prior to crash landing to attempt to minimise explosions. Risk management is about lessening impact as well as lessening probability. If something is important, you don't only make it work as well as you can, you mitigate the effect of failure. A reliable system is not just a strong belt, it is belt, braces, suspenders and bicycle clips. I have seen the OOM killer in operation three times on our production servers. In each case it kept the machine alive in the face of hostile runaway processes. I don't want to see things killed, but if that is the only way to keep the host alive, I vote to keep it alive. When I'm on a plane, I want more than one engine _and_ lifejackets. Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM killer???
On Thu, Mar 29, 2001 at 01:01:54PM +0200, Guest section DW wrote: [Never use planes where the company's engineers spend their time designing algorithms for selecting which passenger must be thrown out when the plane is overloaded.] This is (as far as I can see) a fantastically specious argument. A plane is designed to function in an entirely constrained mode of operation and in an entirely well-understood and circumscribed problem space, whereas a linux host is a general-purpose device which can be used for many different applications. The reason the aero engineers don't need to select a passanger to throw out when the plane is overloaded is simply that the plane operators do not allow the plane to become overloaded. If I put a 100 people in a trilander it may take off, but won't fly, and will probably crash. The plane's designers don't have to do anything about that- we do. Furthermore, why do you suppose an aeroplane has more than one altimeter, artifical horizon and compass? Do you think it's because they are unable to make one of each that is reliable? Or do you think its because they are concerned about what happens if one fails _however unlikely that is_. In fact, aeroplane engineers do design in ways of mitigating the effects of all kinds of failures, including lessening the impact of a crash (directly analogous to our OOM killer). For example, they provide means of jettisonning fuel prior to crash landing to attempt to minimise explosions. Risk management is about lessening impact as well as lessening probability. If something is important, you don't only make it work as well as you can, you mitigate the effect of failure. A reliable system is not just a strong belt, it is belt, braces, suspenders and bicycle clips. I have seen the OOM killer in operation three times on our production servers. In each case it kept the machine alive in the face of hostile runaway processes. I don't want to see things killed, but if that is the only way to keep the host alive, I vote to keep it alive. When I'm on a plane, I want more than one engine _and_ lifejackets. Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Disturbing news..
On Wed, Mar 28, 2001 at 06:08:15AM -0600, Jesse Pollard wrote: > Sure - very simple. If the execute bit is set on a file, don't allow > ANY write to the file. This does modify the permission bits slightly > but I don't think it is an unreasonable thing to have. > Are we not then in the somewhat zen-like state of having an "rm" which can't "rm" itself without needing to be made non-executable so that it can't execute? Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Disturbing news..
On Wed, Mar 28, 2001 at 06:08:15AM -0600, Jesse Pollard wrote: Sure - very simple. If the execute bit is set on a file, don't allow ANY write to the file. This does modify the permission bits slightly but I don't think it is an unreasonable thing to have. Are we not then in the somewhat zen-like state of having an "rm" which can't "rm" itself without needing to be made non-executable so that it can't execute? Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: binfmt_script and ^M
I propose /proc/sys/kernel/im_too_lame_to_learn_how_to_use_the_most_basic_of_unix_tools_so_i_want_the_kernel_to_be_filled_with_crap_to_disguise_my_ineptitude Any support? Sean On Tue, Mar 06, 2001 at 02:45:51PM -, Laramie Leavitt wrote: > > Andreas Schwab wrote: > > > Paul Flinders <[EMAIL PROTECTED]> writes: > > > |> Andreas Schwab wrote: > > > |> > > > |> > This [isspace('\r') == 1] has no significance here. The > > right thing to > > > |> > > > |> > look at is $IFS, which does not contain \r by default. > > The shell only splits > > > |> > > > |> > words by "IFS whitespace", and the kernel should be > > consistent with it: > > > |> > > > > |> > $ echo -e 'ls foo\r' | sh > > > |> > ls: foo: No such file or directory > > > |> > > > |> The problem with that argument is that #! can be applied > > > |> to more than just shells which understand $IFS, so which environment > > > |> variable does the kernel pick? > > > > > > The kernel should use the same default value of IFS as the Bourne shell, > > > ie. the same value you'll get with /bin/sh -c 'echo "$IFS"'. This is > > > independent of any settings in the environment. > > > > > > |> It's a difficult one - logically white space should > > terminate the interpreter > > > > > > No, IFS-whitespace delimits arguments in the Bourne shell. > > > > Way back whenever processing #! was moved from the > > shell to the kernel** this argument would have made sense - > > today I'm not so sure. > > > > But I'm quite happy for the kernel to use just space and > > tab if it wishes, or anything else for that matter but it _is_ > > confusing that the error code doesn't distinguish problems > > with the script from problems with the interpreter. > > > > **Did linux ever rely on the shell for this? > > Maybe the correct answer would be to create a proc entry for this. > That allow the user to decide what is whitespace on his machine, > since nobody here appears to agree. > > User: hmm... Wonder what happes if i do the following >%cat '$#! \n\t\r' > /proc/whitespace > later, % config.sh : Error file not found. > Oops, bug report... ;-) > > Laramie > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: binfmt_script and ^M
I propose /proc/sys/kernel/im_too_lame_to_learn_how_to_use_the_most_basic_of_unix_tools_so_i_want_the_kernel_to_be_filled_with_crap_to_disguise_my_ineptitude Any support? Sean On Tue, Mar 06, 2001 at 02:45:51PM -, Laramie Leavitt wrote: Andreas Schwab wrote: Paul Flinders [EMAIL PROTECTED] writes: | Andreas Schwab wrote: | | This [isspace('\r') == 1] has no significance here. The right thing to | | look at is $IFS, which does not contain \r by default. The shell only splits | | words by "IFS whitespace", and the kernel should be consistent with it: | | $ echo -e 'ls foo\r' | sh | ls: foo: No such file or directory | | The problem with that argument is that #!interpreter can be applied | to more than just shells which understand $IFS, so which environment | variable does the kernel pick? The kernel should use the same default value of IFS as the Bourne shell, ie. the same value you'll get with /bin/sh -c 'echo "$IFS"'. This is independent of any settings in the environment. | It's a difficult one - logically white space should terminate the interpreter No, IFS-whitespace delimits arguments in the Bourne shell. Way back whenever processing #! was moved from the shell to the kernel** this argument would have made sense - today I'm not so sure. But I'm quite happy for the kernel to use just space and tab if it wishes, or anything else for that matter but it _is_ confusing that the error code doesn't distinguish problems with the script from problems with the interpreter. **Did linux ever rely on the shell for this? Maybe the correct answer would be to create a proc entry for this. That allow the user to decide what is whitespace on his machine, since nobody here appears to agree. User: hmm... Wonder what happes if i do the following %cat '$#! \n\t\r' /proc/whitespace later, % config.sh : Error file not found. Oops, bug report... ;-) Laramie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.2 -> 2.4: /proc/net/tcp 10x slower ?
The identd wot I wrote is still fast as anything on 2.4 :) As you can see from this teeny sample of my ident log, I take just a little over 1/100th of a second to respond (on average). :) 2001-02-25 16:18:35.714731500 Q [194.75.152.225] - [32907, 25] 2001-02-25 16:18:35.726085500 A [194.75.152.225] - [9a0c62e79c0df893bb96dd74/3a99305b/b0164] for [32907, 25] UID [506] 2001-02-26 09:41:02.535514500 Q [195.92.249.252] - [33363, 21] 2001-02-26 09:41:02.548884500 A [195.92.249.252] - [8c0babd7b8ab6830b7092839/3a9a24ae/8454c] for [33363, 21] UID [500] By the way, the intention of my ident server was not to be fast, but just to be a little simpler and less over-engineered than pidentd, and not to give out any site-specific information (uid's etc). The speed was a bonus. Sean On Mon, Feb 26, 2001 at 03:12:01PM +0100, Sven Rudolph wrote: > Usually identd's on Linux parse /proc/net/tcp. > > When migrating from Linux 2.2.17 to 2.4.2 identd became much slower. > > I traced it back to the point where /proc/net/tcp is read. > > On the same slightly loaded system: > > 2.2.17 $ time cat /proc/net/tcp >/dev/null > real0m0.004s > user0m0.000s > sys 0m0.010s > > (Or sometimes 0.000s due to granularity) > > 2.2.17 $ time cat /proc/net/tcp >/dev/null > real0m0.083s > user0m0.000s > sys 0m0.080s > > > Is this expected? Or is there a more efficient interface that identd > should use? > > Sven > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.2 - 2.4: /proc/net/tcp 10x slower ?
The identd wot I wrote is still fast as anything on 2.4 :) As you can see from this teeny sample of my ident log, I take just a little over 1/100th of a second to respond (on average). :) 2001-02-25 16:18:35.714731500 Q [194.75.152.225] - [32907, 25] 2001-02-25 16:18:35.726085500 A [194.75.152.225] - [9a0c62e79c0df893bb96dd74/3a99305b/b0164] for [32907, 25] UID [506] 2001-02-26 09:41:02.535514500 Q [195.92.249.252] - [33363, 21] 2001-02-26 09:41:02.548884500 A [195.92.249.252] - [8c0babd7b8ab6830b7092839/3a9a24ae/8454c] for [33363, 21] UID [500] By the way, the intention of my ident server was not to be fast, but just to be a little simpler and less over-engineered than pidentd, and not to give out any site-specific information (uid's etc). The speed was a bonus. Sean On Mon, Feb 26, 2001 at 03:12:01PM +0100, Sven Rudolph wrote: Usually identd's on Linux parse /proc/net/tcp. When migrating from Linux 2.2.17 to 2.4.2 identd became much slower. I traced it back to the point where /proc/net/tcp is read. On the same slightly loaded system: 2.2.17 $ time cat /proc/net/tcp /dev/null real0m0.004s user0m0.000s sys 0m0.010s (Or sometimes 0.000s due to granularity) 2.2.17 $ time cat /proc/net/tcp /dev/null real0m0.083s user0m0.000s sys 0m0.080s Is this expected? Or is there a more efficient interface that identd should use? Sven - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: random PID generation
I have already written a 2.2 implementation which does not suffer from these problems. It was rejected because Alan Cox (and others) felt it only provided security through obscurity. Sean On Fri, Feb 23, 2001 at 11:40:37PM +0800, Matt Johnston wrote: > OpenBSD has a working implementation, might be worth looking at??? > > Cheers, > Matt Johnston. > > On Fri, 23 Feb 2001 23:34, Heusden, Folkert van wrote: > > >> My code runs trough the whole task_list to see if a chosen pid is > > >> already > > >> > > >> in use or not. > > > > > > But it doesn't check for a recently used PID. Lets say your system is > > > exhausting 1000 PIDs/second, and that there is a window of 20ms between > > > > you > > > > > determining which PID to send to, and the recipient process receiving it. > > > > Ah, I get your point. Good point :o) > > > > I was thinking: I could split the PIDs up in 2...16383 and 16384-32767 and > > then > > switch between them when a process ends? nah, that doesn't help it. > > hmmm. > > I think random increments (instead of last_pid+1) would be the best thing > > to do then? > > > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to [EMAIL PROTECTED] > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: random PID generation
I have already written a 2.2 implementation which does not suffer from these problems. It was rejected because Alan Cox (and others) felt it only provided security through obscurity. Sean On Fri, Feb 23, 2001 at 11:40:37PM +0800, Matt Johnston wrote: OpenBSD has a working implementation, might be worth looking at??? Cheers, Matt Johnston. On Fri, 23 Feb 2001 23:34, Heusden, Folkert van wrote: My code runs trough the whole task_list to see if a chosen pid is already in use or not. But it doesn't check for a recently used PID. Lets say your system is exhausting 1000 PIDs/second, and that there is a window of 20ms between you determining which PID to send to, and the recipient process receiving it. Ah, I get your point. Good point :o) I was thinking: I could split the PIDs up in 2...16383 and 16384-32767 and then switch between them when a process ends? nah, that doesn't help it. hmmm. I think random increments (instead of last_pid+1) would be the best thing to do then? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Alpha: bad unaligned access handling
On Wed, Feb 14, 2001 at 03:38:33PM -0200, Carlos Carvalho wrote: > Sean Hunter ([EMAIL PROTECTED]) wrote on 14 February 2001 17:26: > >This is an application problem, not a kernel one. You need to upgrade your > >netkit. > > Yes, I was quite confident of this. However, unaligned traps are a > frequent problem with alphas. For a looong time we had zsh produce > lots of it, to the point of making it unusable. Strangely, the problem > disappeared without changing anything in zsh. It was either a library > or kernel problem. Definitely library, I'd think. > > >P.S. I wrote a small wrapper to aid in the debugging of unaligned > >traps, which I'll send to anyone who's interested. > > I'd like it! > OK, my alpha is a sick bunny at the moment, so I'll have to wait until I get home (so I can see why I can't ssh to it). What the wrapper does is set some settings so your program gets sigbus when it generates an unaligned trap, and then runs your program in gdb so gdb helpfully stops at the line which generated the trap. It goes without saying you need to build the program in question with debugging symbols so that you see the code. You then need to fix the unaligned access. This sometimes requires real alpha guruhood (Which I do not possess, but Richard Henderson or Michal Jagerman do, if you need advice), but sometimes simply requires adding __attribute__ ((__unaligned__)) to a struct member in a c file. Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Alpha: bad unaligned access handling
On Wed, Feb 14, 2001 at 03:11:17PM -0200, Carlos Carvalho wrote: > Jan-Benedict Glaw ([EMAIL PROTECTED]) wrote on 14 February 2001 15:48: > >With my currently installed ping (netkit-ping 0.10-6 from Debian Woody) > >I get unaligned accesses: > > > >ping(15953): unaligned trap at 0001200030e4: 000120026b34 29 1 > >ping(15953): unaligned trap at 000120003110: 000120026b2c 29 2 > > > >The worse part is: they seem to be handled The Wrong Way: > > > >[jbglaw@air:/home/jbglaw] $> ping -c 1 localhost > >PING localhost (127.0.0.1): 56 data bytes > >64 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=13.8 ms > >wrong data byte #8 should be 0x8 but was 0xdc > >c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 >26 27 28 29 2a 2b > >2c 2d 2e 2f 0 0 0 0 0 0 0 0 0 0 0 0 > > > >--- localhost ping statistics --- > >1 packets transmitted, 1 packets received, 0% packet loss > >round-trip min/avg/max = 13.8/13.8/13.8 ms > > > > > >This is on a NoName Alpha box, running 2.4.0-test8-pre1 (with very good > >uptimes), but I think 2.4.2-pre2 would do the same (wrong) things as > >arch/alpha/kernel/traps.c wasn't really changed since ages... > > I also get these, with 2.2.18pre5 (plus some Andrea patches) and > vanilla 2.2.19pre10 on a SMP UP2000. This is an application problem, not a kernel one. You need to upgrade your netkit. Sean P.S. I wrote a small wrapper to aid in the debugging of unaligned traps, which I'll send to anyone who's interested. > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Alpha: bad unaligned access handling
On Wed, Feb 14, 2001 at 03:11:17PM -0200, Carlos Carvalho wrote: Jan-Benedict Glaw ([EMAIL PROTECTED]) wrote on 14 February 2001 15:48: With my currently installed ping (netkit-ping 0.10-6 from Debian Woody) I get unaligned accesses: ping(15953): unaligned trap at 0001200030e4: 000120026b34 29 1 ping(15953): unaligned trap at 000120003110: 000120026b2c 29 2 The worse part is: they seem to be handled The Wrong Way: [jbglaw@air:/home/jbglaw] $ ping -c 1 localhost PING localhost (127.0.0.1): 56 data bytes 64 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=13.8 ms wrong data byte #8 should be 0x8 but was 0xdc c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 0 0 0 0 0 0 0 0 0 0 0 0 --- localhost ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 13.8/13.8/13.8 ms This is on a NoName Alpha box, running 2.4.0-test8-pre1 (with very good uptimes), but I think 2.4.2-pre2 would do the same (wrong) things as arch/alpha/kernel/traps.c wasn't really changed since ages... I also get these, with 2.2.18pre5 (plus some Andrea patches) and vanilla 2.2.19pre10 on a SMP UP2000. This is an application problem, not a kernel one. You need to upgrade your netkit. Sean P.S. I wrote a small wrapper to aid in the debugging of unaligned traps, which I'll send to anyone who's interested. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Alpha: bad unaligned access handling
On Wed, Feb 14, 2001 at 03:38:33PM -0200, Carlos Carvalho wrote: Sean Hunter ([EMAIL PROTECTED]) wrote on 14 February 2001 17:26: This is an application problem, not a kernel one. You need to upgrade your netkit. Yes, I was quite confident of this. However, unaligned traps are a frequent problem with alphas. For a looong time we had zsh produce lots of it, to the point of making it unusable. Strangely, the problem disappeared without changing anything in zsh. It was either a library or kernel problem. Definitely library, I'd think. P.S. I wrote a small wrapper to aid in the debugging of unaligned traps, which I'll send to anyone who's interested. I'd like it! OK, my alpha is a sick bunny at the moment, so I'll have to wait until I get home (so I can see why I can't ssh to it). What the wrapper does is set some settings so your program gets sigbus when it generates an unaligned trap, and then runs your program in gdb so gdb helpfully stops at the line which generated the trap. It goes without saying you need to build the program in question with debugging symbols so that you see the code. You then need to fix the unaligned access. This sometimes requires real alpha guruhood (Which I do not possess, but Richard Henderson or Michal Jagerman do, if you need advice), but sometimes simply requires adding __attribute__ ((__unaligned__)) to a struct member in a c file. Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PCI-PCI bridges mess in 2.4.x
On Thu, Nov 09, 2000 at 04:31:24PM -0700, Michal Jaegermann wrote: > On Thu, Nov 09, 2000 at 11:33:47AM -0500, Wakko Warner wrote: > > > It was posted to lkml, so no link (except if you want to dig through > > > lkml mail archives). > > > > It booted but then it oops'ed before userland I belive. I tried it this > > morning and didn't have much time. It did find the scsi controller (which > > is across the bridge) and the drives attached so it does appear to be > > working. > > Looks so far that I am the worst off. If I am trying to boot with > a root on a SCSI device then either a controller is misdetected, > or goes into an infinite "abort/reset" loop, or it does not initialize > properly and disks are not found. This is a non-exclusive, logical, > "or". :-) Me too! Exact same symptoms on my ruffian. Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: PCI-PCI bridges mess in 2.4.x
On Thu, Nov 09, 2000 at 04:31:24PM -0700, Michal Jaegermann wrote: On Thu, Nov 09, 2000 at 11:33:47AM -0500, Wakko Warner wrote: It was posted to lkml, so no link (except if you want to dig through lkml mail archives). It booted but then it oops'ed before userland I belive. I tried it this morning and didn't have much time. It did find the scsi controller (which is across the bridge) and the drives attached so it does appear to be working. Looks so far that I am the worst off. If I am trying to boot with a root on a SCSI device then either a controller is misdetected, or goes into an infinite "abort/reset" loop, or it does not initialize properly and disks are not found. This is a non-exclusive, logical, "or". :-) metooMe too!/metoo Exact same symptoms on my ruffian. Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: PCI-PCI bridges mess in 2.4.x
Hi Richard. I'm _very_ keen to try this (my Alpha won't boot 2.4 at the mo), however I think the attachments faery has been playing tricks again. Do you have a patch relative to 2.4.0-test10? Sean On Wed, Nov 08, 2000 at 01:39:31AM -0800, Richard Henderson wrote: > [ For l-k, the issue is that pci-pci bridges and the devices behind > them are not initialized properly. There are a number of Alphas > whose built-in scsi controlers are behind such a bridge preventing > these machines from booting at all. Ivan provided an initial > patch to solve this issue. ] > > I've not gotten a chance to try this on the rawhide yet, > but I did give it a whirl on my up1000, which does have > an agp bridge that acts like a pci bridge. > > Notable changes from your patch: > > * Use kmalloc, not vmalloc. (ouch!) > * Replace cropped found_vga detection code. > * Handle bridges with empty I/O (or MEM) ranges. > * Collect the proper width of the bus range. > > > r~ Content-Description: diff vs bridges-2.4.0t10 > diff -rup linux/drivers/pci/setup-bus.c 2.4.0-11-1/drivers/pci/setup-bus.c > --- linux/drivers/pci/setup-bus.c Wed Nov 8 01:24:16 2000 > +++ 2.4.0-11-1/drivers/pci/setup-bus.cWed Nov 8 01:04:17 2000 > @@ -20,7 +20,7 @@ > #include > #include > #include > -#include > +#include > > > #define DEBUG_CONFIG 1 > @@ -56,31 +56,50 @@ pbus_assign_resources_sorted(struct pci_ > mem_reserved += 32*1024*1024; > continue; > } > + > + if (dev->class >> 8 == PCI_CLASS_DISPLAY_VGA) > + found_vga = 1; > + > pdev_sort_resources(dev, _io, IORESOURCE_IO); > pdev_sort_resources(dev, _mem, IORESOURCE_MEM); > } > + > for (list = head_io.next; list;) { > res = list->res; > idx = res - >dev->resource[0]; > - if (pci_assign_resource(list->dev, idx) == 0) > + if (pci_assign_resource(list->dev, idx) == 0 > + && ranges->io_end < res->end) > ranges->io_end = res->end; > tmp = list; > list = list->next; > - vfree(tmp); > + kfree(tmp); > } > for (list = head_mem.next; list;) { > res = list->res; > idx = res - >dev->resource[0]; > - if (pci_assign_resource(list->dev, idx) == 0) > + if (pci_assign_resource(list->dev, idx) == 0 > + && ranges->mem_end < res->end) > ranges->mem_end = res->end; > tmp = list; > list = list->next; > - vfree(tmp); > + kfree(tmp); > } > + > ranges->io_end += io_reserved; > ranges->mem_end += mem_reserved; > + > + /* ??? How to turn off a bus from responding to, say, I/O at > +all if there are no I/O ports behind the bus? Turning off > +PCI_COMMAND_IO doesn't seem to do the job. So we must > +allow for at least one unit. */ > + if (ranges->io_end == ranges->io_start) > + ranges->io_end += 1; > + if (ranges->mem_end == ranges->mem_start) > + ranges->mem_end += 1; > + > ranges->io_end = ROUND_UP(ranges->io_end, 4*1024); > ranges->mem_end = ROUND_UP(ranges->mem_end, 1024*1024); > + > return found_vga; > } > > diff -rup linux/drivers/pci/setup-res.c 2.4.0-11-1/drivers/pci/setup-res.c > --- linux/drivers/pci/setup-res.c Wed Nov 8 01:24:16 2000 > +++ 2.4.0-11-1/drivers/pci/setup-res.cWed Nov 8 00:21:13 2000 > @@ -22,10 +22,10 @@ > #include > #include > #include > -#include > +#include > > > -#define DEBUG_CONFIG 0 > +#define DEBUG_CONFIG 1 > #if DEBUG_CONFIG > # define DBGC(args) printk args > #else > @@ -146,7 +146,7 @@ pdev_sort_resources(struct pci_dev *dev, > if (ln) > size = ln->res->end - ln->res->start; > if (r->end - r->start > size) { > - tmp = vmalloc(sizeof(*tmp)); > + tmp = kmalloc(sizeof(*tmp), GFP_KERNEL); > tmp->next = ln; > tmp->res = r; > tmp->dev = dev; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: PCI-PCI bridges mess in 2.4.x
Hi Richard. I'm _very_ keen to try this (my Alpha won't boot 2.4 at the mo), however I think the attachments faery has been playing tricks again. Do you have a patch relative to 2.4.0-test10? Sean On Wed, Nov 08, 2000 at 01:39:31AM -0800, Richard Henderson wrote: [ For l-k, the issue is that pci-pci bridges and the devices behind them are not initialized properly. There are a number of Alphas whose built-in scsi controlers are behind such a bridge preventing these machines from booting at all. Ivan provided an initial patch to solve this issue. ] I've not gotten a chance to try this on the rawhide yet, but I did give it a whirl on my up1000, which does have an agp bridge that acts like a pci bridge. Notable changes from your patch: * Use kmalloc, not vmalloc. (ouch!) * Replace cropped found_vga detection code. * Handle bridges with empty I/O (or MEM) ranges. * Collect the proper width of the bus range. r~ Content-Description: diff vs bridges-2.4.0t10 diff -rup linux/drivers/pci/setup-bus.c 2.4.0-11-1/drivers/pci/setup-bus.c --- linux/drivers/pci/setup-bus.c Wed Nov 8 01:24:16 2000 +++ 2.4.0-11-1/drivers/pci/setup-bus.cWed Nov 8 01:04:17 2000 @@ -20,7 +20,7 @@ #include linux/errno.h #include linux/ioport.h #include linux/cache.h -#include linux/vmalloc.h +#include linux/slab.h #define DEBUG_CONFIG 1 @@ -56,31 +56,50 @@ pbus_assign_resources_sorted(struct pci_ mem_reserved += 32*1024*1024; continue; } + + if (dev-class 8 == PCI_CLASS_DISPLAY_VGA) + found_vga = 1; + pdev_sort_resources(dev, head_io, IORESOURCE_IO); pdev_sort_resources(dev, head_mem, IORESOURCE_MEM); } + for (list = head_io.next; list;) { res = list-res; idx = res - list-dev-resource[0]; - if (pci_assign_resource(list-dev, idx) == 0) + if (pci_assign_resource(list-dev, idx) == 0 + ranges-io_end res-end) ranges-io_end = res-end; tmp = list; list = list-next; - vfree(tmp); + kfree(tmp); } for (list = head_mem.next; list;) { res = list-res; idx = res - list-dev-resource[0]; - if (pci_assign_resource(list-dev, idx) == 0) + if (pci_assign_resource(list-dev, idx) == 0 + ranges-mem_end res-end) ranges-mem_end = res-end; tmp = list; list = list-next; - vfree(tmp); + kfree(tmp); } + ranges-io_end += io_reserved; ranges-mem_end += mem_reserved; + + /* ??? How to turn off a bus from responding to, say, I/O at +all if there are no I/O ports behind the bus? Turning off +PCI_COMMAND_IO doesn't seem to do the job. So we must +allow for at least one unit. */ + if (ranges-io_end == ranges-io_start) + ranges-io_end += 1; + if (ranges-mem_end == ranges-mem_start) + ranges-mem_end += 1; + ranges-io_end = ROUND_UP(ranges-io_end, 4*1024); ranges-mem_end = ROUND_UP(ranges-mem_end, 1024*1024); + return found_vga; } diff -rup linux/drivers/pci/setup-res.c 2.4.0-11-1/drivers/pci/setup-res.c --- linux/drivers/pci/setup-res.c Wed Nov 8 01:24:16 2000 +++ 2.4.0-11-1/drivers/pci/setup-res.cWed Nov 8 00:21:13 2000 @@ -22,10 +22,10 @@ #include linux/errno.h #include linux/ioport.h #include linux/cache.h -#include linux/vmalloc.h +#include linux/slab.h -#define DEBUG_CONFIG 0 +#define DEBUG_CONFIG 1 #if DEBUG_CONFIG # define DBGC(args) printk args #else @@ -146,7 +146,7 @@ pdev_sort_resources(struct pci_dev *dev, if (ln) size = ln-res-end - ln-res-start; if (r-end - r-start size) { - tmp = vmalloc(sizeof(*tmp)); + tmp = kmalloc(sizeof(*tmp), GFP_KERNEL); tmp-next = ln; tmp-res = r; tmp-dev = dev; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Loadavg calculation
Sorry, I know this is a little left-field, but how about redesigning your process so that instead of using a load_avg, you start all your calculations from a single server on each node? It could queue up incoming calculations, and fork a child to do each one. Of course, it would catch a signal when the child died, so you'd immediately know when to start up another calculation. If you liked, it could check the one-minute load avg from time to time to see what would be a friendly level of calculations overall, adjust the overall level of concurrent child processes accordingly. The timing, however, would still come from a signal, and would thus be instantaneous. Or am I being totally dumb? Sean On Sun, Nov 05, 2000 at 07:55:40AM -0500, [EMAIL PROTECTED] wrote: > > I'm working a project a work that is using Linux to run some very > math-intensive calculations. One of the things we do is use the 1-minute > loadavg to determine how busy the machine is and can we fire off another > program to do more calculations.However, there's a problem with that. > > Because it's a 1 minute load average, there's quite a bit of lag time from > when 1 program finishes until the loadavg goes down below a threshold for > our control mechanism to fire off another program. > > Let me give an example (all on a 1-cpu PC) > > HH:MM:SS > 00:00:00 fire off 4 programs > 00:01:00 loadavg goes up to 4 > 00:01:30 3 of the 4 programs finish loadavg still at 4 > 00:02:20 load avg goes down to 1, below our threshold > 00:02:21 we fire off 3 more programs. > > We'd like to reduce that almost 50 second lag time. Is it possible, in > user-space, to duplicate the loadavg calculation period, say to a 15 > second load average, using the information in /proc? > > The other option we looked at, besides using loadavg, was using idle pct%, > but if I read the source for top right, involves reading the entire > process table to calculate clock ticks used and then figuring out how many > weren't used. > > Ideas, opinions welcome. Yes, I read the list, so either respond direct > to me, or to the list. > > [EMAIL PROTECTED] (Robert A. Yetman) > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Loadavg calculation
Sorry, I know this is a little left-field, but how about redesigning your process so that instead of using a load_avg, you start all your calculations from a single server on each node? It could queue up incoming calculations, and fork a child to do each one. Of course, it would catch a signal when the child died, so you'd immediately know when to start up another calculation. If you liked, it could check the one-minute load avg from time to time to see what would be a friendly level of calculations overall, adjust the overall level of concurrent child processes accordingly. The timing, however, would still come from a signal, and would thus be instantaneous. Or am I being totally dumb? Sean On Sun, Nov 05, 2000 at 07:55:40AM -0500, [EMAIL PROTECTED] wrote: I'm working a project a work that is using Linux to run some very math-intensive calculations. One of the things we do is use the 1-minute loadavg to determine how busy the machine is and can we fire off another program to do more calculations.However, there's a problem with that. Because it's a 1 minute load average, there's quite a bit of lag time from when 1 program finishes until the loadavg goes down below a threshold for our control mechanism to fire off another program. Let me give an example (all on a 1-cpu PC) HH:MM:SS 00:00:00 fire off 4 programs 00:01:00 loadavg goes up to 4 00:01:30 3 of the 4 programs finish loadavg still at 4 00:02:20 load avg goes down to 1, below our threshold 00:02:21 we fire off 3 more programs. We'd like to reduce that almost 50 second lag time. Is it possible, in user-space, to duplicate the loadavg calculation period, say to a 15 second load average, using the information in /proc? The other option we looked at, besides using loadavg, was using idle pct%, but if I read the source for top right, involves reading the entire process table to calculate clock ticks used and then figuring out how many weren't used. Ideas, opinions welcome. Yes, I read the list, so either respond direct to me, or to the list. [EMAIL PROTECTED] (Robert A. Yetman) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0-test10 Sluggish After Load
On Wed, Nov 01, 2000 at 11:10:46AM -0600, matthew wrote: > On Wed, 1 Nov 2000, Sean Hunter wrote: > > > Pardon my speculations (if I am wrong), but isn't this an oracle question? > > > It could be. > > > > Isn't oracle killing the server by trying to clean up 1800 connections all at > > once? When they're all connected, most of the work is done by one or two > > oracle processes, but when you kill your ddos thing, all of the oracle > > listeners (of which there is one per connection), steam in and try to clean up. > > > Yes, but the factor that drove me to the list was that it's been > 400 > load average for 10 hours now. Even if Oracle tried to clean up 1800 > connections at once, would it take this long? That's not rhetorical, as > the answer may well be "yes". > Yup. What seems to have happened is that waking up 1800 processes at once has caused the box to thrash so hard it is taking ages for any one process to get enough scheduler time to clean itself up and exit. I guess we may need a thrash preventer that slows things down enough for each process to get a healthy bite of the cherry. Sean > > > I thought oracle had an internal connection limit (on our servers it is set to > > 440 connections), anyways. > > > This is set in the init.ora. I jacked it up to allow > 2000 connections. > > Matthew > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/