Found it!
I'll put y'all out of your misery. The workaround is:
Edit /etc/defaults/acpi-support, to comment out "POST_VIDEO=true".
This is hardly a general solution for Gutsy (then again, acpi-support
seems to be a seething mass of hacks upon hacks already, it would seem
to fit in somewhere...).
Basically, it's not suspend/resume which causes the page table
corruption, it's the "vbetool post" command. You can cause identical
effects without suspending, using this command from inside a terminal
window:
sudo sh -c 'chvt 1; sleep 2; vbetool post; vbetool vbestate restore
</var/lib/acpi-support/vbestate; sleep 2; chvt 7'
If you run that command after booting and starting X, from a terminal
window, then when it finally returns to showing X, the screen artifact
will be present, and it may be causing more serious random memory
corruption behind the scenes (which hasn't been noticed so far).
If you don't run that command during suspend/resume (by editing
/etc/default/acpi-support), then there's no screen artifact or page
table corruption.
The other vbetool commands: vbetool vbemode get; vbetool vbestate
restore don't seem to affect the page table *in my tests*. The vbestate
restore could in principle clobber the page table; this is no guarantee
that it doesn't.
vbetool vbestate restore is still required, unfortunately, by ACPI
resume, to restore the text console. It does *not* seem to be required
for X to resume, only the text console fails to display if you disable
this command.
Perhaps the cleanest fix is either (1) to have "vbetool post" save and
restore the page table around POST, or (2) for the kernel to implement
memory protection when executing VBE calls and detect such changes and
update its own structures. Is the page table a generic AGP page table,
independent of the video driver, which the kernel tracks?
Now that I have a fix which works on my laptop, I'm happy. (Just
command out POST_VIDEO in /etc/default/acpi-support). Still, it will be
nice when the bug is fixed in the general distribution, without a
laptop-specific workaround.
Peter Clifton wrote:
> [snip] what is the reason for the corrupt page tables?
> Urm... pass on that one ;)
> Jamie? Does the page table look the same before suspend? (Or when the bug is
> not showing?)
At first I though I saw the screen corruption after boot, without
suspending, but when I tried it today I was surprised to see no
corruption after a fresh boot (and power cycle), and then it appeared
after suspend/resume... I guess I was wrong about it appearing straight
away, then.
(The reason I didn't respond earlier was because I didn't reboot my
laptop for a couple of weeks, as I had work on it I didn't want to lose
track of. It's nice to see that suspend/resume are reliable enough to
use them several times a day for a couple of weeks now! Earlier
versions of this driver often crashed during suspend, or during switch
to console. It seems reliable now.)
Then I suspected one of the steps in the ACPI process, and looked in the
scripts in /etc/acpi/{suspend,resume}.d for clues. Then I
systematically did some tests with your intel_reg_dumper, and here's
what I found....
Attached is an archive containing 11 dumpes from your modified
intel_reg_dumper, including the page tables. The dumps were taken in
sequence, following a single boot. The boot was done after a power
cycle, removing the battery and mains, to be sure of the state.
The X driver is Gutsy's current, xserver-xorg-video-intel
version2:2.1.1-0ubuntu9.
intel_reg_dump_01_after_boot: After a power cycle (removed battery and
mains), booting through Ubuntu's splash screen and eventually to running
X, logging in, and switching to dual head with TMDS-1 = 1680x1050+0+0
and LVDS = 1280x800+400+1050. This will show the artifact, and also
demonstrates nicely that suspend/resume works fine with dual head.
intel_reg_dump_02_console_and_back: Switched to a text console, and back
to X on console 7, then dumped registers again. Some changes, for the
curious. I wanted to get closer the state which is restored by
suspend/resume, as that always switches to the console and back.
intel_reg_dump_03_vberestore_and_back: Switched to console 1, ran
"vbetool vbestate restore </var/lib/acpi-support/vbestate", then
switched back to X, then did this dump. This restores the console VBE
state which was saved during the boot sequence. Suspend/resume uses
this same state during resume (it doesn't store the state during
suspend, it always uses the boot time state). There's no change to the
page tables, but there are some curious register changes which are
hopefully harmless. (The TV control knobs are all zeroed, which might
be related to TV suspend/resume bugs if anyone's interested.)
intel_reg_dump_04_suspend_resume: Did ACPI suspend/resume started by
gnome-power-manager, then did this dump after the resume. The console
before and after was the X console. This ran through all the scripts in
/etc/acpi/suspend.d and /etc/acpi/resume.d as usual. All ACPI scripts
are as distributed in Gutsy, but I changed /etc/default/acpi-support to
comment out (disabling) SAVE_VBE_STATE and POST_VIDEO. That means
"vbetool post" and "vbetool vbestate restore" were *not* run, and the
only video-related thing done by the ACPI scripts was to change to a
text console, and do "vbetool dpms off/on", which seems to be harmless.
intel_reg_dump_05_console_and_back: Switched to a text console, and back
to X, then did this dump.
intel_reg_dump_06_console_vberestore_and_back: Switched to the console,
did "vbetool post; vbetool vbestate restore </var/lib/acpi-
support/vbestate", then switched back to X, then did this dump. The
restored state was whatever was saved at boot time (suspend doesn't save
the state). This restores the text console, otherwise it doesn't show
after resume. X does show after resume without this.
intel_reg_dump_07_console: Switch to a text console, then did this dump
*while at the text console*. All earlier dumps were inside X. You can
see how page table entries are freed when switching to the console, by
pointing them to the "scratch" page at 0x837814001 (the actual value
varies from boot to boot). Alas, I didn't think to get a register dump
of the console before vberestore (when the text console doesn't show).
intel_reg_dump_08_console_vbepost_vberestore: At the text console, did
"vbetool post" then "vbetool vbestate restore </var/lib/acpi-
support/vbestate", then did this dump. Two things are clear from this:
1) "vbetool post" doesn't do anything interesting to the Intel registers
that isn't already restored by "vbetool vbestate restore", so it's not
required at least on my laptop; 2) "vbetool post", by itself, corrupts
the page table storing 0x83ffbe001 in spurious places (causing the
visible artifact), and in *many* but *not all* places which previously
contained the "scratch" page, 0x837814001.
intel_reg_dump_09_console_vberestore: At the text console, did "vbetool
vbestate restore </var/lib/acpi-support/vbestate" and did this dump,
just to see if there were changes. There were, but nothing significant.
intel_reg_dump_10_console_vberestore: Same as previous.
intel_reg_dump_11_back_to_X: Switch back to the X console, then did this
dump. The difference between dumps 06 and 11 shows no important
register differences (I think), but the page table corruption caused by
"vbetool post" is clear. The artifact (part row duplicated between two
parts of the screen) is now visible on screen, and wasn't before.
** Attachment added: "Intel register and page table dumps after a series of
events"
http://launchpadlibrarian.net/10464389/intel_reg_dumps.tar.bz2
--
screen artifacts after resume, part row of pixels in error (945GM)
https://bugs.launchpad.net/bugs/91966
You received this bug notification because you are a member of Ubuntu
Bugs, which is the bug contact for Ubuntu.
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs