Bug#675302: nouveau: hard lockup when gdm3 starts

2012-10-14 Thread Gedalya

What's up?
Linux 3.2.30-1 (currently in sid) still has the same problem.


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/507afad7.6010...@gedalya.net



Bug#675302: nouveau: hard lockup when gdm3 starts

2012-06-01 Thread Jonathan Nieder
Gedalya wrote:

 Now tried running startx /usr/bin/xterm with nouveau,

 [   82.427553] [drm] nouveau :01:00.0: PDISP: DCB for
 6/0xbad00103 not found
 [   82.428536] [drm] nouveau :01:00.0: PDISP: DCB for
 0/0xbad00103 not found
 [   82.429483] [drm] nouveau :01:00.0: Table 0x0103 not found
 for 0/2, using first

 I kept a previously opened ssh connection.
 When starting X, the screen went black, but didn't totally lock up
 until I killed the X process from SSH. No further netconsole output,
 the machine went totally dead.

Worrisome.  Can you send a full kernel log from booting and doing
this, including the boot-up sequence?  Please send it as an attachment
if possible so the log doesn't get corrupted in transit (e.g. by line
wrapping).

Thanks,
Jonathan



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120601055746.GC28116@burratino



Bug#675302: nouveau: hard lockup when gdm3 starts

2012-06-01 Thread Gedalya

On 6/1/2012 1:57 AM, Jonathan Nieder wrote:

Worrisome.  Can you send a full kernel log from booting and doing
this, including the boot-up sequence?  Please send it as an attachment
if possible so the log doesn't get corrupted in transit (e.g. by line
wrapping).


Saved the log from partedmagic too just in case it can help. Looks like 
the ring buffer here is too small to keep the earliest boot messages.



root@PartedMagic:~# uname -a
Linux PartedMagic 3.3.6-pmagic #1 SMP Sat May 12 20:01:06 CDT 2012 i686 
Intel(R) Core(TM)2 Quad CPUQ9550  @ 2.83GHz GenuineIntel GNU/Linux

[0.303349] pci_bus :06: resource 7 [mem 0x000c-0x000d]
[0.303669] pci_bus :06: resource 8 [mem 0xfed4-0xfed44fff]
[0.303988] pci_bus :06: resource 9 [mem 0xd7f0-0xfebf]
[0.304334] NET: Registered protocol family 2
[0.304677] IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
[0.305082] TCP established hash table entries: 131072 (order: 8, 1048576 
bytes)
[0.305904] TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
[0.306367] TCP: Hash tables configured (established 131072 bind 65536)
[0.306688] TCP reno registered
[0.306995] UDP hash table entries: 512 (order: 2, 16384 bytes)
[0.307335] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
[0.307692] NET: Registered protocol family 1
[0.308104] RPC: Registered named UNIX socket transport module.
[0.308422] RPC: Registered udp transport module.
[0.308735] RPC: Registered tcp transport module.
[0.309061] RPC: Registered tcp NFSv4.1 backchannel transport module.
[0.331043] pci :01:00.0: Boot video device
[0.331379] PCI: CLS 32 bytes, default 64
[0.331725] Trying to unpack rootfs image as initramfs...
[0.440926] Freeing initrd memory: 38420k freed
[0.453397] audit: initializing netlink socket (disabled)
[0.453728] type=2000 audit(1338532572.452:1): initialized
[0.454640] highmem bounce pool size: 64 pages
[0.459860] VFS: Disk quotas dquot_6.5.2
[0.460287] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[0.460734] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[0.461297] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[0.461618] ROMFS MTD (C) 2007 Red Hat, Inc.
[0.462329] aufs 3.3-20120402
[0.462635] msgmni has been set to 1421
[0.466088] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 
253)
[0.46] io scheduler noop registered
[0.466973] io scheduler deadline registered
[0.467307] io scheduler cfq registered (default)
[0.467989] pcieport :00:01.0: irq 40 for MSI/MSI-X
[0.468433] pcieport :00:1c.0: irq 41 for MSI/MSI-X
[0.468869] pcieport :00:1c.3: irq 42 for MSI/MSI-X
[0.469337] pcieport :00:1c.4: irq 43 for MSI/MSI-X
[0.469768] pcieport :00:1c.5: irq 44 for MSI/MSI-X
[0.470411] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[0.471043] isapnp: Scanning for PnP cards...
[0.826975] isapnp: No Plug  Play device found
[0.833490] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[1.098144] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[1.119600] 00:07: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[1.123779] brd: module loaded
[1.180634] loop: module loaded
[1.181738] Loading iSCSI transport class v2.0-870.
[1.183443] i8042: PNP: No PS/2 controller found. Probing ports directly.
[1.184214] serio: i8042 KBD port at 0x60,0x64 irq 1
[1.184545] serio: i8042 AUX port at 0x60,0x64 irq 12
[1.185132] mousedev: PS/2 mouse device common for all mice
[1.185756] EISA: Probing bus 0 at eisa.0
[1.186106] EISA: Cannot allocate resource for mainboard
[1.186420] Cannot allocate resource for EISA slot 1
[1.186733] Cannot allocate resource for EISA slot 2
[1.187058] Cannot allocate resource for EISA slot 3
[1.187370] Cannot allocate resource for EISA slot 4
[1.187681] Cannot allocate resource for EISA slot 5
[1.187992] Cannot allocate resource for EISA slot 6
[1.188325] Cannot allocate resource for EISA slot 7
[1.188637] Cannot allocate resource for EISA slot 8
[1.188948] EISA: Detected 0 cards.
[1.189275] cpuidle: using governor ladder
[1.189583] cpuidle: using governor menu
[1.190029] TCP cubic registered
[1.190334] Initializing XFRM netlink socket
[1.190646] NET: Registered protocol family 17
[1.191120] Registering the dns_resolver key type
[1.191450] Using IPI No-Shortcut mode
[1.299537] registered taskstats version 1
[1.300194] drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[1.300612] Freeing unused kernel memory: 556k freed
[1.301106] Write protecting the kernel text: 3548k
[1.301451] Write protecting the kernel read-only data: 1340k
[1.454034] Refined TSC clocksource calibration: 2833.010 MHz.
[1.454039] Switching to clocksource tsc
[1.527281] 

Bug#675302: nouveau: hard lockup when gdm3 starts

2012-06-01 Thread Jonathan Nieder
Gedalya wrote:

 Here.

Perfect, thanks much.

Ok, one more test and we should take this upstream: can you reproduce
this with a 3.3.y or newer kernel from experimental?

If so, please report this at http://bugs.freedesktop.org/, product
Xorg, component Driver/nouveau (yes, that's where they track their
kernel bugs, too), and let us know the bug number so we can track it.

http://nouveau.freedesktop.org/wiki/Bugs has more advice.  xterm uses
2d, GNOME 3 uses some 3d features.

Hope that helps,
Jonathan



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120601064714.GD28116@burratino



Bug#675302: nouveau: hard lockup when gdm3 starts

2012-06-01 Thread Gedalya

On 6/1/2012 2:47 AM, Jonathan Nieder wrote:

Ok, one more test and we should take this upstream: can you reproduce
this with a 3.3.y or newer kernel from experimental?

If so, please report this athttp://bugs.freedesktop.org/, product
Xorg, component Driver/nouveau (yes, that's where they track their
kernel bugs, too), and let us know the bug number so we can track it.


Bug 50571 - https://bugs.freedesktop.org/show_bug.cgi?id=50571




--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fc86bc2.7010...@gedalya.net



Bug#675302: nouveau: hard lockup when gdm3 starts

2012-06-01 Thread Jonathan Nieder
found 675302 linux-2.6/3.3.6-1~experimental.1
affects 675302 + xserver-xorg-video-nouveau
forwarded 675302 https://bugs.freedesktop.org/50571
quit

Gedalya wrote:

 Bug 50571 - https://bugs.freedesktop.org/show_bug.cgi?id=50571

Thanks!  If I have any more questions, I'll just ask them upstream.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120601073154.GA30339@burratino



Bug#675302: nouveau: hard lockup when gdm3 starts

2012-05-31 Thread Jonathan Nieder
Hi,

Gedalya wrote:

 Tried removing the nvidia stuff and booting up with nouveau.
[...]
 Total system hang as soon as xorg starts. No response from keyboard,
 mouse, no response on the network (no ping, no ARP response)
[...]
 Using an Nvidia GeForce GT 520.

Do I understand correctly that the system works fine until X starts
(e.g., if you use the text kernel command line option)?  Does X with
the fbdev driver work?  (You can test by putting the following in
/etc/X11/xorg.conf.)

Section Device
Identifier geforce
Driver fbdev
EndSection

Does starting X with the nouveau driver work if you do not start a
GNOME session?  (You can test by running startx xterm if the xinit
package is installed.)

Thanks for a clear report.

Hope that helps,
Jonathan



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120531055946.GC1447@burratino



Bug#675302: nouveau: hard lockup when gdm3 starts

2012-05-31 Thread Jonathan Nieder
Gedalya wrote:
 On 5/31/2012 1:59 AM, Jonathan Nieder wrote:

 Does starting X with the nouveau driver work if you do not start a
 GNOME session?  (You can test by running startx xterm if the xinit
 package is installed.)

 Interesting. Still working on this one. startx was complaining
 something about xorg.conf so I just renamed it, and then it started
 up. Pretty frozen at this point, no keyboard, only reset button
 helps, but I do get network response - there is ping, initial ssh
 response but nothing more, can't actually log in.

Hm.  Might be possible to get a log with netconsole[1].

[1] http://www.kernel.org/doc/Documentation/networking/netconsole.txt
http://blog.mraw.org/2010/11/08/Debugging_using_netconsole/



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120531063143.GD1447@burratino



Bug#675302: nouveau: hard lockup when gdm3 starts

2012-05-31 Thread Gedalya

On 5/31/2012 2:31 AM, Jonathan Nieder wrote:

Gedalya wrote:

On 5/31/2012 1:59 AM, Jonathan Nieder wrote:

Does starting X with the nouveau driver work if you do not start a
GNOME session?  (You can test by running startx xterm if the xinit
package is installed.)

Interesting. Still working on this one. startx was complaining
something about xorg.conf so I just renamed it, and then it started
up. Pretty frozen at this point, no keyboard, only reset button
helps, but I do get network response - there is ping, initial ssh
response but nothing more, can't actually log in.

Hm.  Might be possible to get a log with netconsole[1].

[1] http://www.kernel.org/doc/Documentation/networking/netconsole.txt
http://blog.mraw.org/2010/11/08/Debugging_using_netconsole/


Tried this again and this time I got a total hang again.

Then I got it to work with the config file so I first tried fbdev and it 
worked, then when I switched to nouveau I got again this situation that 
it's pretty much hung but with ping.


SSH does respond with failed authentication when it's the wrong 
password, but no successful login is possible.


I'm gonna study netconsole now, let's see if I can figure it out.




--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fc7117d.9040...@gedalya.net



Bug#675302: nouveau: hard lockup when gdm3 starts

2012-05-31 Thread Gedalya

On 5/31/2012 1:59 AM, Jonathan Nieder wrote:

Hi,

Gedalya wrote:


Tried removing the nvidia stuff and booting up with nouveau.

[...]

Total system hang as soon as xorg starts. No response from keyboard,
mouse, no response on the network (no ping, no ARP response)

[...]

Using an Nvidia GeForce GT 520.

Do I understand correctly that the system works fine until X starts
(e.g., if you use the text kernel command line option)?

Yes. As long as X doesn't start it seems rock solid.

   Does X with
the fbdev driver work?  (You can test by putting the following in
/etc/X11/xorg.conf.)

Section Device
Identifier geforce
Driver fbdev
EndSection


Yes. This does work. BTW driver vesa hangs the same.


Does starting X with the nouveau driver work if you do not start a
GNOME session?  (You can test by running startx xterm if the xinit
package is installed.)


Interesting. Still working on this one. startx was complaining something 
about xorg.conf so I just renamed it, and then it started up. Pretty 
frozen at this point, no keyboard, only reset button helps, but I do get 
network response - there is ping, initial ssh response but nothing more, 
can't actually log in.



Thanks for a clear report.

Hope that helps,
Jonathan





--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fc70ec3.7040...@gedalya.net



Bug#675302: nouveau: hard lockup when gdm3 starts

2012-05-31 Thread Julien Cristau
On Thu, May 31, 2012 at 00:59:46 -0500, Jonathan Nieder wrote:

 (You can test by running startx xterm if the xinit
 package is installed.)
 
You mean startx /usr/bin/xterm.

Cheers,
Julien



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120531211540.gl31...@radis.cristau.org



Bug#675302: nouveau: hard lockup when gdm3 starts

2012-05-31 Thread Gedalya

On 5/31/2012 2:31 AM, Jonathan Nieder wrote:

Hm.  Might be possible to get a log with netconsole[1].

[1]http://www.kernel.org/doc/Documentation/networking/netconsole.txt
http://blog.mraw.org/2010/11/08/Debugging_using_netconsole/


Now tried running startx /usr/bin/xterm with nouveau,

[   82.427553] [drm] nouveau :01:00.0: PDISP: DCB for 6/0xbad00103 
not found
[   82.428536] [drm] nouveau :01:00.0: PDISP: DCB for 0/0xbad00103 
not found
[   82.429483] [drm] nouveau :01:00.0: Table 0x0103 not found for 
0/2, using first


I kept a previously opened ssh connection.
When starting X, the screen went black, but didn't totally lock up until 
I killed the X process from SSH. No further netconsole output, the 
machine went totally dead.


Rebooted, this time tried to start gdm3. This time we got some juice.

[   84.538008] [drm] nouveau :01:00.0: PDISP: DCB for 6/0xbad00103 
not found
[   84.538990] [drm] nouveau :01:00.0: PDISP: DCB for 0/0xbad00103 
not found
[   84.539937] [drm] nouveau :01:00.0: Table 0x0103 not found for 
0/2, using first
[   85.216875] BUG: unable to handle kernel paging request at 
8800f1d5f100
[   85.216940] IP: [a0473eb8] evo_wait.constprop.13+0x3f/0xaa 
[nouveau]

[   85.216991] PGD 1606063 PUD 1fffc067 PMD 0
[   85.217020] Oops: 0002 [#1] SMP
[   85.217045] CPU 2
[   85.217057] Modules linked in: usb_storage uas netconsole configfs 
nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc ext2 loop 
firewire_sbp2 tpm_infineon nouveau mxm_wmi snd_hda_codec_hdmi wmi video 
snd_usb_audio snd_usbmidi_lib uvcvideo snd_rawmidi snd_hda_codec_realtek 
ttm snd_seq_device drm_kms_helper videodev iTCO_wdt iTCO_vendor_support 
parport_pc parport psmouse pcspkr serio_raw drm i2c_algo_bit i2c_i801 
snd_hda_intel v4l2_compat_ioctl32 snd_hda_codec snd_hwdep snd_pcm 
snd_page_alloc media button processor tpm_tis tpm i2c_core tpm_bios 
evdev snd_timer snd soundcore thermal_sys ext4 crc16 jbd2 mbcache dm_mod 
raid1 md_mod sd_mod crc_t10dif sr_mod cdrom ata_generic usbhid hid 
pata_jmicron firewire_ohci firewire_core crc_itu_t uhci_hcd r8169 mii 
ahci libahci libata scsi_mod ehci_hcd usbcore usb_common [last unloaded: 
scsi_wait_scan]

[   85.217816]
[   85.217826] Pid: 2605, comm: Xorg Not tainted 3.2.0-2-amd64 #1 
Gigabyte Technology Co., Ltd. EP45-UD3P/EP45-UD3P
[   85.217885] RIP: 0010:[a0473eb8]  [a0473eb8] 
evo_wait.constprop.13+0x3f/0xaa [nouveau]

[   85.217942] RSP: 0018:88021be35cd8  EFLAGS: 00010212
[   85.217969] RAX: 88003705f000 RBX: 88021cdc7000 RCX: 

[   85.218003] RDX: 2eb40040 RSI: 0064 RDI: 
88021cdc7000
[   85.218037] RBP: 88021b66bac0 R08:  R09: 
8168d880
[   85.218071] R10: ffea R11: ffea R12: 
2eb40050
[   85.218105] R13: 88021b859001 R14: 88021db4cbc0 R15: 
88021dc03c00
[   85.218140] FS:  7fbf30962880() GS:880227d0() 
knlGS:

[   85.218178] CS:  0010 DS:  ES:  CR0: 80050033
[   85.218207] CR2: 8800f1d5f100 CR3: 00021dac7000 CR4: 
000406e0
[   85.218241] DR0:  DR1:  DR2: 

[   85.218275] DR3:  DR6: 0ff0 DR7: 
0400
[   85.218310] Process Xorg (pid: 2605, threadinfo 88021be34000, 
task 88021ddb35d0)

[   85.218348] Stack:
[   85.218359]  88021b859000 88021cdc7000 88021cdc7001 
a0474303
[   85.218406]  88021b859001 88021b859000  
88021cdc7020
[   85.218451]  88021b859001 a04746f4 c01c64a3 
88021be35df0

[   85.218497] Call Trace:
[   85.218519]  [a0474303] ? nvd0_crtc_cursor_show+0x21/0xf8 
[nouveau]
[   85.218562]  [a04746f4] ? nvd0_crtc_cursor_set+0xd5/0xf1 
[nouveau]
[   85.218602]  [a029ad7c] ? drm_mode_cursor_ioctl+0xe5/0x13d 
[drm]

[   85.218703]  [a028f61f] ? drm_ioctl+0x289/0x35e [drm]
[   85.218738]  [a029ac97] ? drm_mode_setcrtc+0x376/0x376 [drm]
[   85.218772]  [8134c42d] ? do_page_fault+0x2fc/0x337
[   85.218802]  [81106605] ? do_vfs_ioctl+0x459/0x49a
[   85.218831]  [81106691] ? sys_ioctl+0x4b/0x72
[   85.218859]  [8134e452] ? system_call_fastpath+0x16/0x1b
[   85.21] Code: 89 fb 48 8b a8 08 0f 00 00 e8 06 f4 ff ff 89 c2 c1 
ea 02 41 01 d4 41 81 fc ff 03 00 00 76 66 48 8b 45 10 be 00 00 64 00 48 
89 df c7 04 90 00 00 00 20 31 d2 e8 ed f3 ff ff 45 31 c0 83 c9 ff ba
[   85.219253] RIP  [a0473eb8] evo_wait.constprop.13+0x3f/0xaa 
[nouveau]

[   85.219299]  RSP 88021be35cd8
[   85.219316] CR2: 8800f1d5f100


Tried pinging the machine at this point and netconsole printed this:

[  240.648014] INFO: task kworker/2:1:30 blocked for more than 120 seconds.
[  240.648064] echo 0  /proc/sys/kernel/hung_task_timeout_secs 
disables this message.
[  240.648122] kworker/2:1 D