apparently I am still not doing a very good job of explaining the
situation... let me try again.

on 10.10 the best case idle temp is 51C
on 12.04 the best case idle temp is about 57C
on 13.04 the best case idle temp is 75C  (that is at idle! and more typically 
it is low 80's)

under full load:
the max temp of 10.10 is 83C
the max temp of 12.04 is 81C
the max temp of 13.04 is 96C and climbing...  (it was shutdown rather than risk 
meltdown)

it is expected that 12.04 would be a little bit hotter than 10.10
because 10.10 is 32 bits and generic whereas the others are 64 bits and
low latency.  The difference between 10.10 at full load and 12.04 at
full load was probably due to lack of consistency in the load itself, in
any case, they are close enough not to matter.

Suppose that blowing the dust out of the fan achieved an unrealistic
gain of 10C in cooling.  That would still leave 13.04 at 86C (or higher)
which really is still too hot.  However, blowing the dust out of the fan
would be expected to affect all of the oses equally.  So that still
leaves us with a differential of 75 - 51 = 24C at idle and 96 - 83 = 13C
at 100% load,  between the oses.

That is a huge difference and the only source of that difference is the
software.  If I were running these tests years apart, that would be a
different thing, and it would be reasonable to blame dust for the
difference, but I am not; instead I have a multi-boot setup and I am
running the tests within ten minutes of each other, so hardware
differences are ruled out because it is the same hardware and the same
amount of dust.

Furthermore, on 10.10 when the temp gets down to around 55C, the Fan
Shuts Off...  therefore however much dust there might be, it is not even
a factor for the temperature of 10.10 because under light load or idle
it does not even use the fan.  With the other os versions the fan never
shuts off, but 12.04 can get pretty quiet, 13.04 the fan is always loud
even at idle.

so there is indeed a very serious problem here.

I started out thinking this was a video driver issue, because I started
out with the observation that at idle up to moderate loads the video was
consistently 2 to 3 degrees hotter than the other temps.  However, in
further testing I observed that at high loads the other temperatures
greatly exceeded the video temperature.  So at that point I back-tracked
on my assumptions about the video driver being the sole culprit.

It has been my experience as a programmer that when one is presented
with a complex set of symptoms, it is usually the result of multiple
bugs appearing to be a single problem.  So, I'm inclined to suspect that
there is a problem with the radeon but that there is also a problem with
the cpu scheduler too.

Right now I am working on two things.  The first thing is that I have
done a fresh clean/new install of 12.04, because my main version of
12.04 has had a lot of changes made, so I want a pristine os for
testing.  The other chief advantage is that on the stock os, lm-sensors
is able to read the radeon temperature, but on my main 12.04 lm-sensors
is no longer able to read the radeon temp.  So, with lm-sensors working
properly we can have a direct comparison with 13.04.

The second thing that I am working on is trying to solve the problem of
why lm-sensors can't read the fan speed.  I think it would be very
helpful to see exactly what the fans are doing.

Now, in preliminary testing of the 12.04 new install with nothing added
(except temperature reading software) and no updates I was very
surprised to see that it was running substantially hotter than my main
12.04 which has had zillions of updates and other changes.  This is a
preliminary finding I ran out of time and was not yet able to pursue
this further, but the result was quite surprising.

Something else that I have played with a little bit is the scheduler
mode.  Changing it from OnDemand to PowerSave shows cpu use going way up
at the same time that temperature goes down.  The problem appears to be
that the time spent on power saving is not being properly accounted for
in the cpu load calculation.  HTOP returns numbers that are nearly
identical to my own program's calculations, so it may be a problem with
the kernel's accounting.

Bottom line is that there is an overheating problem, in fact it may be
two different problems, and this is a serious issue because it can lead
to hardware destruction.

But figuring out what the problem is, is going to be difficult.  I am
working on some more tests to try to make the situation clearer.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1166916

Title:
  temperature overheating of cpu and radeon in 12.10 and above

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-ati/+bug/1166916/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to