Bug#538158: Still a problem as of 5.0.3
Ben Hutchings b...@decadent.org.uk writes: If it's a lot of trouble for you to do that then perhaps it's not worth doing until we've investigated a bit further. I have outfitted our test server with 8GB of memory. I'm going to let it sit idle with 2.6.26-2-686-bigmem just to see whether it shows the stuck CPU syndrome. If it does, I'll install 2.6.30 from squeeze. Can you tell me anything about the workload of these systems, e.g. are they running an NFS server, database server, web server...? Do they have heavy disk I/O, network I/O, task load or swap usage? The two machines are general purpose compute servers and NX servers. People log in to run their KDE desktop and do whatever it is that they need to do. The machines are big NFS clients, but they run no special services. Local disk activity is minimal, and with 28GB of RAM the swap has never been used so far. So, network activity is higher than a typical workstation, but the average sustained rate is not that high. There usually are over a thousand concurrent processes running. In fact, after further investigation, it looks quite plausible that this bug is related to long idle periods. I found some bug fixes made in Linux 2.6.27 that may address this. This would make sense, because, we saw more of this bug before we announced the servers to our users, and they (the servers) just sat idle. Thanks, -- Arcady Genkin : CDF Systems Administrator http://www.cdf.toronto.edu/~agenkin/ -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#538158: Still a problem as of 5.0.3
On Thu, 2009-10-22 at 01:20 +0100, Ben Hutchings wrote: On Wed, 2009-10-21 at 17:01 -0400, Arcady Genkin wrote: [...] Before that, when we were testing the servers, the problem sometimes would occur several times per day on each machine. The kernel has not changed since end of August, I believe. Perhaps an idle system is more susceptible to this problem. It seems unlikely, but it is conceivable. In fact, after further investigation, it looks quite plausible that this bug is related to long idle periods. I found some bug fixes made in Linux 2.6.27 that may address this. Ben. -- Ben Hutchings The obvious mathematical breakthrough [to break modern encryption] would be development of an easy way to factor large prime numbers. - Bill Gates signature.asc Description: This is a digitally signed message part
Bug#538158: Still a problem as of 5.0.3
Ben Hutchings b...@decadent.org.uk writes: Can you test whether this also occurs in Linux 2.6.30? You can install this from unstable without replacing the current kernel package and without pulling in any dependencies from unstable and. We have already put the two servers into production, and I cannot switch kernels on them. We have a backup server with the same hardware, but it's missing the RAM. I'll try to scramble together some RAM to see whether I can reproduce this with 2.6.30 on the backup machine. It won't be exact replication, though, because I cannot easily find the whole 28GB of ECC memory: perhaps 4 or 8GB. This bug happens very unpredictably. Out of the two production servers, one no longer shows this problem at all (but used to in early September), and the other had it after the last boot-up (in the first days of October), but, since then, it has not reoccurred. Before that, when we were testing the servers, the problem sometimes would occur several times per day on each machine. The kernel has not changed since end of August, I believe. Perhaps an idle system is more susceptible to this problem. Thanks for your reply, -- Arcady Genkin -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#538158: Still a problem as of 5.0.3
On Wed, 2009-10-21 at 17:01 -0400, Arcady Genkin wrote: Ben Hutchings b...@decadent.org.uk writes: Can you test whether this also occurs in Linux 2.6.30? You can install this from unstable without replacing the current kernel package and without pulling in any dependencies from unstable and. We have already put the two servers into production, and I cannot switch kernels on them. We have a backup server with the same hardware, but it's missing the RAM. I'll try to scramble together some RAM to see whether I can reproduce this with 2.6.30 on the backup machine. It won't be exact replication, though, because I cannot easily find the whole 28GB of ECC memory: perhaps 4 or 8GB. If it's a lot of trouble for you to do that then perhaps it's not worth doing until we've investigated a bit further. This bug happens very unpredictably. Out of the two production servers, one no longer shows this problem at all (but used to in early September), and the other had it after the last boot-up (in the first days of October), but, since then, it has not reoccurred. Can you tell me anything about the workload of these systems, e.g. are they running an NFS server, database server, web server...? Do they have heavy disk I/O, network I/O, task load or swap usage? Before that, when we were testing the servers, the problem sometimes would occur several times per day on each machine. The kernel has not changed since end of August, I believe. Perhaps an idle system is more susceptible to this problem. It seems unlikely, but it is conceivable. Ben. -- Ben Hutchings Everything should be made as simple as possible, but not simpler. - Albert Einstein signature.asc Description: This is a digitally signed message part
Bug#538158: Still a problem as of 5.0.3
On Thu, 2009-09-10 at 13:56 -0400, Arcady Genkin wrote: This problem is still there with linux-2.6/2.6.26-19 from Debian release 5.0.3. Can you test whether this also occurs in Linux 2.6.30? You can install this from unstable without replacing the current kernel package and without pulling in any dependencies from unstable and. If the bug also occurs in Linux 2.6.30, please report this at http://bugzilla.kernel.org, setting product to 'Platform Specific/ Hardware' and component to 'i386'. Let us know the bug number so we can track its progress. Otherwise, we can try to work out how this was fixed upstream. Ben. -- Ben Hutchings I say we take off; nuke the site from orbit. It's the only way to be sure. signature.asc Description: This is a digitally signed message part
Bug#538158: Still a problem as of 5.0.3
This problem is still there with linux-2.6/2.6.26-19 from Debian release 5.0.3. -- Arcady Genkin -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org