Bug#538158: Still a problem as of 5.0.3

2009-10-27 Thread Arcady Genkin
Ben Hutchings b...@decadent.org.uk writes:

 If it's a lot of trouble for you to do that then perhaps it's not worth
 doing until we've investigated a bit further.

I have outfitted our test server with 8GB of memory.  I'm going to let
it sit idle with 2.6.26-2-686-bigmem just to see whether it shows the
stuck CPU syndrome.  If it does, I'll install 2.6.30 from squeeze.

 Can you tell me anything about the workload of these systems, e.g. are
 they running an NFS server, database server, web server...?  Do they
 have heavy disk I/O, network I/O, task load or swap usage?

The two machines are general purpose compute servers and NX servers.
People log in to run their KDE desktop and do whatever it is that they
need to do.  The machines are big NFS clients, but they run no special
services.  Local disk activity is minimal, and with 28GB of RAM the
swap has never been used so far.  So, network activity is higher than
a typical workstation, but the average sustained rate is not that
high.  There usually are over a thousand concurrent processes running.

 In fact, after further investigation, it looks quite plausible that this
 bug is related to long idle periods.  I found some bug fixes made in
 Linux 2.6.27 that may address this.

This would make sense, because, we saw more of this bug before we
announced the servers to our users, and they (the servers) just sat
idle.

Thanks,
-- 
Arcady Genkin : CDF Systems Administrator
http://www.cdf.toronto.edu/~agenkin/



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#538158: Still a problem as of 5.0.3

2009-10-24 Thread Ben Hutchings
On Thu, 2009-10-22 at 01:20 +0100, Ben Hutchings wrote:
 On Wed, 2009-10-21 at 17:01 -0400, Arcady Genkin wrote:
[...]
  Before
  that, when we were testing the servers, the problem sometimes would
  occur several times per day on each machine.  The kernel has not
  changed since end of August, I believe.
  
  Perhaps an idle system is more susceptible to this problem.
 
 It seems unlikely, but it is conceivable.

In fact, after further investigation, it looks quite plausible that this
bug is related to long idle periods.  I found some bug fixes made in
Linux 2.6.27 that may address this.

Ben.

-- 
Ben Hutchings
The obvious mathematical breakthrough [to break modern encryption] would be
development of an easy way to factor large prime numbers. - Bill Gates


signature.asc
Description: This is a digitally signed message part


Bug#538158: Still a problem as of 5.0.3

2009-10-21 Thread Arcady Genkin
Ben Hutchings b...@decadent.org.uk writes:

 Can you test whether this also occurs in Linux 2.6.30?  You can install
 this from unstable without replacing the current kernel package and
 without pulling in any dependencies from unstable and.

We have already put the two servers into production, and I cannot
switch kernels on them.  We have a backup server with the same
hardware, but it's missing the RAM.  I'll try to scramble together
some RAM to see whether I can reproduce this with 2.6.30 on the
backup machine.  It won't be exact replication, though, because I
cannot easily find the whole 28GB of ECC memory: perhaps 4 or 8GB.

This bug happens very unpredictably.  Out of the two production
servers, one no longer shows this problem at all (but used to in early
September), and the other had it after the last boot-up (in the first
days of October), but, since then, it has not reoccurred.  Before
that, when we were testing the servers, the problem sometimes would
occur several times per day on each machine.  The kernel has not
changed since end of August, I believe.

Perhaps an idle system is more susceptible to this problem.

Thanks for your reply,
-- 
Arcady Genkin



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#538158: Still a problem as of 5.0.3

2009-10-21 Thread Ben Hutchings
On Wed, 2009-10-21 at 17:01 -0400, Arcady Genkin wrote:
 Ben Hutchings b...@decadent.org.uk writes:
 
  Can you test whether this also occurs in Linux 2.6.30?  You can install
  this from unstable without replacing the current kernel package and
  without pulling in any dependencies from unstable and.
 
 We have already put the two servers into production, and I cannot
 switch kernels on them.  We have a backup server with the same
 hardware, but it's missing the RAM.  I'll try to scramble together
 some RAM to see whether I can reproduce this with 2.6.30 on the
 backup machine.  It won't be exact replication, though, because I
 cannot easily find the whole 28GB of ECC memory: perhaps 4 or 8GB.

If it's a lot of trouble for you to do that then perhaps it's not worth
doing until we've investigated a bit further.

 This bug happens very unpredictably.  Out of the two production
 servers, one no longer shows this problem at all (but used to in early
 September), and the other had it after the last boot-up (in the first
 days of October), but, since then, it has not reoccurred.

Can you tell me anything about the workload of these systems, e.g. are
they running an NFS server, database server, web server...?  Do they
have heavy disk I/O, network I/O, task load or swap usage?

 Before
 that, when we were testing the servers, the problem sometimes would
 occur several times per day on each machine.  The kernel has not
 changed since end of August, I believe.
 
 Perhaps an idle system is more susceptible to this problem.

It seems unlikely, but it is conceivable.

Ben.

-- 
Ben Hutchings
Everything should be made as simple as possible, but not simpler.
   - Albert Einstein


signature.asc
Description: This is a digitally signed message part


Bug#538158: Still a problem as of 5.0.3

2009-10-04 Thread Ben Hutchings
On Thu, 2009-09-10 at 13:56 -0400, Arcady Genkin wrote:
 This problem is still there with linux-2.6/2.6.26-19 from Debian
 release 5.0.3.

Can you test whether this also occurs in Linux 2.6.30?  You can install
this from unstable without replacing the current kernel package and
without pulling in any dependencies from unstable and.

If the bug also occurs in Linux 2.6.30, please report this at
http://bugzilla.kernel.org, setting product to 'Platform Specific/
Hardware' and component to 'i386'.  Let us know the bug number so we can
track its progress.

Otherwise, we can try to work out how this was fixed upstream.

Ben.

-- 
Ben Hutchings
I say we take off; nuke the site from orbit.  It's the only way to be sure.


signature.asc
Description: This is a digitally signed message part


Bug#538158: Still a problem as of 5.0.3

2009-09-10 Thread Arcady Genkin
This problem is still there with linux-2.6/2.6.26-19 from Debian
release 5.0.3.
-- 
Arcady Genkin



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org