Bug#608278: linux-image-2.6.32-5-amd64: Random general protection faults at boot

2011-07-30 Thread Moritz Mühlenhoff
On Mon, Jan 03, 2011 at 08:32:26PM +, Ben Hutchings wrote:
 On Mon, 2011-01-03 at 15:19 +0100, Andrea Spadaccini wrote:
  Hi,
  
   This is presumably caused by a bug in rtl8187.  However, it is not a
   general protection fault so there may be two different bugs here.
  
   Yes, the GPF is around sec 30. After that bug, I blacklisted rtl8187,
   and resulted in the other GPFs at boot time that are not logged.
  
   Can you try blacklisting the radeon driver?
  
  If I blacklist the radeon driver, I don't get any GPFs anymore.
  
  I did 10 reboots, and no problems.
  
  Can I help you anymore in diagnosing the problem?
 
 Please test the package of Linux 2.6.37-rc7 from experimental, with
 radeon enabled again.

Did you test with a more recent kernel?

Cheers,
Moritz



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#608278: linux-image-2.6.32-5-amd64: Random general protection faults at boot

2011-01-03 Thread Andrea Spadaccini

Hi,


This is presumably caused by a bug in rtl8187.  However, it is not a
general protection fault so there may be two different bugs here.


Yes, the GPF is around sec 30. After that bug, I blacklisted rtl8187,
and resulted in the other GPFs at boot time that are not logged.


Can you try blacklisting the radeon driver?


If I blacklist the radeon driver, I don't get any GPFs anymore.

I did 10 reboots, and no problems.

Can I help you anymore in diagnosing the problem?

Thanks,
Andrea



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#608278: linux-image-2.6.32-5-amd64: Random general protection faults at boot

2011-01-03 Thread Ben Hutchings
On Mon, 2011-01-03 at 15:19 +0100, Andrea Spadaccini wrote:
 Hi,
 
  This is presumably caused by a bug in rtl8187.  However, it is not a
  general protection fault so there may be two different bugs here.
 
  Yes, the GPF is around sec 30. After that bug, I blacklisted rtl8187,
  and resulted in the other GPFs at boot time that are not logged.
 
  Can you try blacklisting the radeon driver?
 
 If I blacklist the radeon driver, I don't get any GPFs anymore.
 
 I did 10 reboots, and no problems.
 
 Can I help you anymore in diagnosing the problem?

Please test the package of Linux 2.6.37-rc7 from experimental, with
radeon enabled again.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.


signature.asc
Description: This is a digitally signed message part


Bug#608278: linux-image-2.6.32-5-amd64: Random general protection faults at boot

2011-01-02 Thread Ben Hutchings
On Thu, 2010-12-30 at 10:08 +0100, Andrea Spadaccini wrote:
 Hi,
 
  Justification: breaks the whole system
 
  This does *not* break the whole system - the package works for other
  people and even you are able to reboot into another kernel version.
 
 Sorry, I misinterpreted the options.
 
 [cut]
 
  Dec 28 11:33:38 beriserv kernel: [   20.820173]  [a000f6b8] ? 
  usb_control_msg+0x124/0x135 [usbcore]
  Dec 28 11:33:38 beriserv kernel: [   20.820173]  [8104800d] ? 
  finish_task_switch+0x3a/0xaf
  Dec 28 11:33:38 beriserv kernel: [   20.820173]  [a031bd2e] ? 
  rtl818x_ioread8+0x61/0x7e [rtl8187]
  Dec 28 11:33:38 beriserv kernel: [   20.820173]  [a031bd6f] ? 
  rtl8187_is_radio_enabled+0x24/0xc0 [rtl8187]
  Dec 28 11:33:38 beriserv kernel: [   20.820173]  [a031be30] ? 
  rtl8187_rfkill_poll+0x25/0x78 [rtl8187]
  [...]
 
  This is presumably caused by a bug in rtl8187.  However, it is not a
  general protection fault so there may be two different bugs here.
 
 Yes, the GPF is around sec 30. After that bug, I blacklisted rtl8187, 
 and resulted in the other GPFs at boot time that are not logged.

Can you try blacklisting the radeon driver?

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.


signature.asc
Description: This is a digitally signed message part


Bug#608278: linux-image-2.6.32-5-amd64: Random general protection faults at boot

2010-12-30 Thread Andrea Spadaccini

Hi,


Justification: breaks the whole system


This does *not* break the whole system - the package works for other
people and even you are able to reboot into another kernel version.


Sorry, I misinterpreted the options.

[cut]


Dec 28 11:33:38 beriserv kernel: [   20.820173]  [a000f6b8] ? 
usb_control_msg+0x124/0x135 [usbcore]
Dec 28 11:33:38 beriserv kernel: [   20.820173]  [8104800d] ? 
finish_task_switch+0x3a/0xaf
Dec 28 11:33:38 beriserv kernel: [   20.820173]  [a031bd2e] ? 
rtl818x_ioread8+0x61/0x7e [rtl8187]
Dec 28 11:33:38 beriserv kernel: [   20.820173]  [a031bd6f] ? 
rtl8187_is_radio_enabled+0x24/0xc0 [rtl8187]
Dec 28 11:33:38 beriserv kernel: [   20.820173]  [a031be30] ? 
rtl8187_rfkill_poll+0x25/0x78 [rtl8187]

[...]

This is presumably caused by a bug in rtl8187.  However, it is not a
general protection fault so there may be two different bugs here.


Yes, the GPF is around sec 30. After that bug, I blacklisted rtl8187, 
and resulted in the other GPFs at boot time that are not logged.


Andrea



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#608278: linux-image-2.6.32-5-amd64: Random general protection faults at boot

2010-12-29 Thread Andrea Spadaccini
Package: linux-2.6
Version: 2.6.32-29
Severity: critical
Justification: breaks the whole system


The system randomly (say, 2 times every 3 boots) shows General Protection Fault
errors at boot time. I don't get to multiuser init levels.

When those errors occur, I can't do anything, the system seems unresponsive.

When I press NumLock, the system shows a message like
BUG: scheduling while atomic: swapper/0/0x10100

And then hangs up completely.

The info displayed in the logs, however, change from time to time. 

I did a memtest, and the RAM is OK.

Reverting to 2.6.32-3-amd64 fixed the problem.

-- Package-specific info:
** Kernel log: boot messages should be attached

I can't get the exact error messages, as the system can't boot and I don't have
a digital camera at hand. I found in /var/log/syslog.0 a similar trace, but I 
don't know if it's the same problem. Here it is:

Dec 28 11:33:38 beriserv kernel: [   20.816204] [ cut here 
]
Dec 28 11:33:38 beriserv kernel: [   20.817009] kernel BUG at 
/build/buildd-linux-2.6_2.6.32-28-amd64-EUJiNq/linux-2.6-2.6.32/debian/build/source_amd64_none/mm/slub.c:2969!
Dec 28 11:33:38 beriserv kernel: [   20.817895] invalid opcode:  [#1] SMP 
Dec 28 11:33:38 beriserv kernel: [   20.818786] last sysfs file: 
/sys/module/processor/initstate
Dec 28 11:33:38 beriserv kernel: [   20.819697] CPU 0 
Dec 28 11:33:38 beriserv kernel: [   20.820173] Modules linked in: acpi_cpufreq 
cpufreq_conservative cpufreq_powersave cpufreq_userspace cpufreq_stats fuse 
loop firewire_sbp2 snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep 
snd_pcm_oss snd_mixer_oss snd_pcm arc4 snd_seq_midi snd_rawmidi 
snd_seq_midi_event snd_seq ecb snd_timer radeon snd_seq_device rtl8187 ttm 
drm_kms_helper mac80211 led_class cfg80211 drm i2c_algo_bit snd rfkill i2c_i801 
soundcore eeprom_93cx6 i2c_core button snd_page_alloc evdev pcspkr asus_atk0110 
processor serio_raw ext3 jbd mbcache usbhid hid sg sd_mod crc_t10dif sr_mod 
cdrom ata_generic pata_jmicron uhci_hcd ahci ata_piix ehci_hcd thermal 
firewire_ohci firewire_core crc_itu_t sky2 floppy libata scsi_mod thermal_sys 
usbcore nls_base [last unloaded: scsi_wait_scan]
Dec 28 11:33:38 beriserv kernel: [   20.820173] Pid: 9, comm: events/0 Not 
tainted 2.6.32-5-amd64 #1 P5K-E
Dec 28 11:33:38 beriserv kernel: [   20.820173] RIP: 0010:[810e6e5f]  
[810e6e5f] kfree+0x55/0xcb
Dec 28 11:33:38 beriserv kernel: [   20.820173] RSP: 0018:88007fb9bd00  
EFLAGS: 00010246
Dec 28 11:33:38 beriserv kernel: [   20.820173] RAX:  RBX: 
0001 RCX: 8118fda5
Dec 28 11:33:38 beriserv kernel: [   20.820173] RDX: 0001 RSI: 
880001811d10 RDI: ea00
Dec 28 11:33:38 beriserv kernel: [   20.820173] RBP: 8800 R08: 
88007fb9a000 R09: 81452c20
Dec 28 11:33:38 beriserv kernel: [   20.820173] R10: 880001812ba0 R11: 
dead00200200 R12: a000f6b8
Dec 28 11:33:38 beriserv kernel: [   20.820173] R13: 0001 R14: 
88007ca3f7c0 R15: ff91
Dec 28 11:33:38 beriserv kernel: [   20.820173] FS:  () 
GS:88000180() knlGS:
Dec 28 11:33:38 beriserv kernel: [   20.820173] CS:  0010 DS: 0018 ES: 0018 
CR0: 8005003b
Dec 28 11:33:38 beriserv kernel: [   20.820173] CR2: 7f873667f000 CR3: 
379d8000 CR4: 06f0
Dec 28 11:33:38 beriserv kernel: [   20.820173] DR0:  DR1: 
 DR2: 
Dec 28 11:33:38 beriserv kernel: [   20.820173] DR3:  DR6: 
0ff0 DR7: 0400
Dec 28 11:33:38 beriserv kernel: [   20.820173] Process events/0 (pid: 9, 
threadinfo 88007fb9a000, task 88007fba)
Dec 28 11:33:38 beriserv kernel: [   20.820173] Stack:
Dec 28 11:33:38 beriserv kernel: [   20.820173]  0001 
8800 0008 a000f6b8
Dec 28 11:33:38 beriserv kernel: [   20.820173] 0 0001 
82800581 88007c8ca000 8104800d
Dec 28 11:33:38 beriserv kernel: [   20.820173] 0 000100015780 
88007ca3f1a0 880001818180 ff91
Dec 28 11:33:38 beriserv kernel: [   20.820173] Call Trace:
Dec 28 11:33:38 beriserv kernel: [   20.820173]  [a000f6b8] ? 
usb_control_msg+0x124/0x135 [usbcore]
Dec 28 11:33:38 beriserv kernel: [   20.820173]  [8104800d] ? 
finish_task_switch+0x3a/0xaf
Dec 28 11:33:38 beriserv kernel: [   20.820173]  [a031bd2e] ? 
rtl818x_ioread8+0x61/0x7e [rtl8187]
Dec 28 11:33:38 beriserv kernel: [   20.820173]  [a031bd6f] ? 
rtl8187_is_radio_enabled+0x24/0xc0 [rtl8187]
Dec 28 11:33:38 beriserv kernel: [   20.820173]  [a031be30] ? 
rtl8187_rfkill_poll+0x25/0x78 [rtl8187]
Dec 28 11:33:38 beriserv kernel: [   20.820173]  [a023be5d] ? 
rfkill_poll+0x1b/0x31 [rfkill]
Dec 28 11:33:38 beriserv kernel: [   20.820173]  [810615c3] ? 

Bug#608278: linux-image-2.6.32-5-amd64: Random general protection faults at boot

2010-12-29 Thread Ben Hutchings
On Wed, 2010-12-29 at 17:12 +0100, Andrea Spadaccini wrote:
 Package: linux-2.6
 Version: 2.6.32-29
 Severity: critical
 Justification: breaks the whole system

This does *not* break the whole system - the package works for other
people and even you are able to reboot into another kernel version.

 The system randomly (say, 2 times every 3 boots) shows General Protection 
 Fault
 errors at boot time. I don't get to multiuser init levels.
[...]
 I found in /var/log/syslog.0 a similar trace, but I 
 don't know if it's the same problem. Here it is:
 
 Dec 28 11:33:38 beriserv kernel: [   20.816204] [ cut here 
 ]
 Dec 28 11:33:38 beriserv kernel: [   20.817009] kernel BUG at 
 /build/buildd-linux-2.6_2.6.32-28-amd64-EUJiNq/linux-2.6-2.6.32/debian/build/source_amd64_none/mm/slub.c:2969!
 Dec 28 11:33:38 beriserv kernel: [   20.817895] invalid opcode:  [#1] SMP 
 Dec 28 11:33:38 beriserv kernel: [   20.818786] last sysfs file: 
 /sys/module/processor/initstate
 Dec 28 11:33:38 beriserv kernel: [   20.819697] CPU 0 
 Dec 28 11:33:38 beriserv kernel: [   20.820173] Modules linked in: 
 acpi_cpufreq cpufreq_conservative cpufreq_powersave cpufreq_userspace 
 cpufreq_stats fuse loop firewire_sbp2 snd_hda_codec_analog snd_hda_intel 
 snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm arc4 snd_seq_midi 
 snd_rawmidi snd_seq_midi_event snd_seq ecb snd_timer radeon snd_seq_device 
 rtl8187 ttm drm_kms_helper mac80211 led_class cfg80211 drm i2c_algo_bit snd 
 rfkill i2c_i801 soundcore eeprom_93cx6 i2c_core button snd_page_alloc evdev 
 pcspkr asus_atk0110 processor serio_raw ext3 jbd mbcache usbhid hid sg sd_mod 
 crc_t10dif sr_mod cdrom ata_generic pata_jmicron uhci_hcd ahci ata_piix 
 ehci_hcd thermal firewire_ohci firewire_core crc_itu_t sky2 floppy libata 
 scsi_mod thermal_sys usbcore nls_base [last unloaded: scsi_wait_scan]
 Dec 28 11:33:38 beriserv kernel: [   20.820173] Pid: 9, comm: events/0 Not 
 tainted 2.6.32-5-amd64 #1 P5K-E
 Dec 28 11:33:38 beriserv kernel: [   20.820173] RIP: 
 0010:[810e6e5f]  [810e6e5f] kfree+0x55/0xcb
 Dec 28 11:33:38 beriserv kernel: [   20.820173] RSP: 0018:88007fb9bd00  
 EFLAGS: 00010246
 Dec 28 11:33:38 beriserv kernel: [   20.820173] RAX:  RBX: 
 0001 RCX: 8118fda5
 Dec 28 11:33:38 beriserv kernel: [   20.820173] RDX: 0001 RSI: 
 880001811d10 RDI: ea00
 Dec 28 11:33:38 beriserv kernel: [   20.820173] RBP: 8800 R08: 
 88007fb9a000 R09: 81452c20
 Dec 28 11:33:38 beriserv kernel: [   20.820173] R10: 880001812ba0 R11: 
 dead00200200 R12: a000f6b8
 Dec 28 11:33:38 beriserv kernel: [   20.820173] R13: 0001 R14: 
 88007ca3f7c0 R15: ff91
 Dec 28 11:33:38 beriserv kernel: [   20.820173] FS:  () 
 GS:88000180() knlGS:
 Dec 28 11:33:38 beriserv kernel: [   20.820173] CS:  0010 DS: 0018 ES: 0018 
 CR0: 8005003b
 Dec 28 11:33:38 beriserv kernel: [   20.820173] CR2: 7f873667f000 CR3: 
 379d8000 CR4: 06f0
 Dec 28 11:33:38 beriserv kernel: [   20.820173] DR0:  DR1: 
  DR2: 
 Dec 28 11:33:38 beriserv kernel: [   20.820173] DR3:  DR6: 
 0ff0 DR7: 0400
 Dec 28 11:33:38 beriserv kernel: [   20.820173] Process events/0 (pid: 9, 
 threadinfo 88007fb9a000, task 88007fba)
 Dec 28 11:33:38 beriserv kernel: [   20.820173] Stack:
 Dec 28 11:33:38 beriserv kernel: [   20.820173]  0001 
 8800 0008 a000f6b8
 Dec 28 11:33:38 beriserv kernel: [   20.820173] 0 0001 
 82800581 88007c8ca000 8104800d
 Dec 28 11:33:38 beriserv kernel: [   20.820173] 0 000100015780 
 88007ca3f1a0 880001818180 ff91
 Dec 28 11:33:38 beriserv kernel: [   20.820173] Call Trace:
 Dec 28 11:33:38 beriserv kernel: [   20.820173]  [a000f6b8] ? 
 usb_control_msg+0x124/0x135 [usbcore]
 Dec 28 11:33:38 beriserv kernel: [   20.820173]  [8104800d] ? 
 finish_task_switch+0x3a/0xaf
 Dec 28 11:33:38 beriserv kernel: [   20.820173]  [a031bd2e] ? 
 rtl818x_ioread8+0x61/0x7e [rtl8187]
 Dec 28 11:33:38 beriserv kernel: [   20.820173]  [a031bd6f] ? 
 rtl8187_is_radio_enabled+0x24/0xc0 [rtl8187]
 Dec 28 11:33:38 beriserv kernel: [   20.820173]  [a031be30] ? 
 rtl8187_rfkill_poll+0x25/0x78 [rtl8187]
[...]

This is presumably caused by a bug in rtl8187.  However, it is not a
general protection fault so there may be two different bugs here.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.


signature.asc
Description: This is a digitally signed message part


Bug#608278: linux-image-2.6.32-5-amd64: Random general protection faults at boot

2010-12-29 Thread Steven Chamberlain
Hi Andrea,

Your kernel Oops shows a very similar stack trace to the rtl8187 problem
I reported in 2.6.32-5-openvz-amd64 2.6.32-2, which would be followed by
tasks locking up and then the whole system crashing:

http://bugs.debian.org/596649

In my situation this would only happen after several hours/days of the
rtl8187 USB wireless adapter (0bda:8187) being active, but maybe that's
because it was idle for most of the time.  If you use it more heavily
(perhaps it's your primary network interface, right from boot) then
maybe that's what triggers it more quickly for you.

Regards,
-- 
Steven Chamberlain
ste...@pyro.eu.org



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#608278: linux-image-2.6.32-5-amd64: Random general protection faults at boot

2010-12-29 Thread Steven Chamberlain
Hi again,

Andrea said the issue wasn't reproducible in 2.6.32-3-amd64.  Does
this mean Debian package version 2.6.32-9?  (cat /proc/version)

I definitely saw the rtl8187 bug in 2.6.32-21, so I would guess this bug
was introduced in -12 or -10 when most of the changes to rtl818x or
mac80211 code were made.

In a few days I'll have my rtl8187 device working again to test, so I'll
try to reproduce the issue in each of these kernel versions.

If Andrea is able to test these, the full range of linux-image and
linux-base packages can be found here:

http://snapshot.debian.org/package/linux-2.6/

Thanks,
Regards,
-- 
Steven Chamberlain
ste...@pyro.eu.org



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org