Bug#852698: Re: linux-image-4.9.0-1-amd64: nouveau seems to hang programs like lspci and Xorg
On Thu, Mar 30, 2017 at 12:57:13AM +0100, Ben Hutchings wrote: > > the new testing 4.9.0 linux-image hangs on a machine with Optimus Technology > > related to the nouveau kernel module - even an "lspci" hangs the process > > hard. > > A controlled shutdown via systemd hangs also indefinitely. > > Kernel log is appended. > > > > Tried vanilla 4.9.6 with similar results. The older 4.8 package from testing > > runs fine > [...] > > This has been reported by another user as fixed in 4.9.10-1. Can you > confirm whether this still occurs on your machine? I'm running vanilla 4.9.15 at the moment and have not seen the problem since a month or so. At least one time I started also the mentioned debian kernel and did not see the problem in a short time. If you like, I can start the debian kernel and report back in a week or so. I've had some black screen events (last yesterday) on the internal screen connected to the intel graphics chipset, but forgot to try a lspci so I don't know if it's related (doubt it): $ xrandr --output LVDS1 --mode 1600x900 --primary xrandr: Configure crtc 0 failed /var/log/Xorg.0.log: [ 62721.387] (II) intel(0): switch to mode 1600x900@60.0 on LVDS1 using pipe 0, position (0, 0), rotation normal, reflection none [ 62721.387] (EE) intel(0): failed to set mode: Invalid argument [22] Please feel free to close (or renice) this bug#. Thanks, greetings Hermann -- Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg IWR; INF 205; 69120 Heidelberg; Tel: (06221)54-14405 Fax: -14427 Email: hermann.la...@iwr.uni-heidelberg.de
Bug#852698: linux-image-4.9.0-1-amd64: nouveau seems to hang programs like lspci and Xorg
Package: src:linux Version: 4.9.2-2 Severity: important Dear Maintainer, the new testing 4.9.0 linux-image hangs on a machine with Optimus Technology related to the nouveau kernel module - even an "lspci" hangs the process hard. A controlled shutdown via systemd hangs also indefinitely. Kernel log is appended. Tried vanilla 4.9.6 with similar results. The older 4.8 package from testing runs fine Please tell if more information is needed, thanks greetings Hermann -- Package-specific info: ** Kernel log: boot messages should be attached ** Model information sys_vendor: LENOVO product_name: 4242PT2 product_version: ThinkPad T520 chassis_vendor: LENOVO chassis_version: Not Available bios_vendor: LENOVO bios_version: 8AET65WW (1.45 ) board_vendor: LENOVO board_name: 4242PT2 board_version: Not Available ** PCI devices: 00:00.0 Host bridge [0600]: Intel Corporation 2nd Generation Core Processor Family DRAM Controller [8086:0104] (rev 09) Subsystem: Lenovo 2nd Generation Core Processor Family DRAM Controller [17aa:21cf] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port [8086:0101] (rev 09) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: Kernel driver in use: pcieport Kernel modules: shpchp 00:02.0 VGA compatible controller [0300]: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0126] (rev 09) (prog-if 00 [VGA controller]) Subsystem: Lenovo 2nd Generation Core Processor Family Integrated Graphics Controller [17aa:21d1] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- Kernel driver in use: i915 Kernel modules: i915 00:16.0 Communication controller [0780]: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 [8086:1c3a] (rev 04) Subsystem: Lenovo 6 Series/C200 Series Chipset Family MEI Controller [17aa:21cf] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Kernel driver in use: mei_me Kernel modules: mei_me 00:19.0 Ethernet controller [0200]: Intel Corporation 82579LM Gigabit Network Connection [8086:1502] (rev 04) Subsystem: Lenovo 82579LM Gigabit Network Connection [17aa:21ce] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Kernel driver in use: e1000e Kernel modules: e1000e 00:1a.0 USB controller [0c03]: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 [8086:1c2d] (rev 04) (prog-if 20 [EHCI]) Subsystem: Lenovo 6 Series/C200 Series Chipset Family USB Enhanced Host Controller [17aa:21cf] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- Kernel driver in use: ehci-pci Kernel modules: ehci_pci 00:1b.0 Audio device [0403]: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller [8086:1c20] (rev 04) Subsystem: Lenovo 6 Series/C200 Series Chipset Family High Definition Audio Controller [17aa:21cf] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel 00:1c.0 PCI bridge [0604]: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 [8086:1c10] (rev b4) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: Kernel driver in use: pcieport Kernel modules: shpchp 00:1c.1 PCI bridge [0604]: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 [8086:1c12] (rev b4) (prog-if 00 [Normal decode])
Bug#770479: [PATCH] Re: nbd: Fix timeout detection
Hello Ben, On Mon, Sep 28, 2015 at 01:29:23AM +0100, Ben Hutchings wrote: > > There has been no unusual messages in the kernel log during a disconnection > > which > > happened (server process restart during update when Wouter fixed the > > nbd-server > > access code). > > > > If more testing or something else is needed, please tell me. > > After reviewing the patch, I'm afraid I can't accept it. Hopefully > upstream will come up with a proper fix in response to my comments. based on Markus fix for 4.3 appended is a backported fix to be applied on top. Tested again with the current jessie kernel without problems as written above. Please review and tell me if I can do more. Thanks a lot, Hermann -- Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224 Email: hermann.la...@iwr.uni-heidelberg.de Backport for jessie kernel, based on: The timeout handling introduced in 7e2893a16d3e (nbd: Fix timeout detection) introduces a race condition which may lead to killing of tasks that are not in nbd context anymore. This was not observed or reproducable yet. This patch adds locking to critical use of task_recv and task_send to avoid killing tasks that already left the NBD thread functions. This lock is only acquired if a timeout occures or the nbd device starts/stops. Reported-by: Ben Hutchings <b...@decadent.org.uk> Tested-by: Hermann Lauer <hermann.la...@iwr.uni-heidelberg.de> --- drivers/block/nbd.c | 36 ++-- 1 file changed, 30 insertions(+), 6 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -142,21 +142,23 @@ static void nbd_xmit_timeout(unsigned long arg) { struct nbd_device *nbd = (struct nbd_device *)arg; - struct task_struct *task; + unsigned long flags; if (list_empty(>queue_head)) return; nbd->disconnect = 1; - task = READ_ONCE(nbd->task_recv); - if (task) - force_sig(SIGKILL, task); + spin_lock_irqsave(>tasks_lock, flags); - task = READ_ONCE(nbd->task_send); - if (task) + if (nbd->task_recv) + force_sig(SIGKILL, nbd->task_recv); + + if (nbd->task_send) force_sig(SIGKILL, nbd->task_send); + spin_unlock_irqrestore(>tasks_lock, flags); + dev_err(disk_to_dev(nbd->disk), "Connection timed out, killed receiver and sender, shutting down connection\n"); } @@ -408,6 +410,7 @@ { struct request *req; int ret; + unsigned long flags; BUG_ON(nbd->magic != NBD_MAGIC); @@ -425,7 +428,9 @@ while ((req = nbd_read_stat(nbd)) != NULL) nbd_end_request(req); + spin_lock_irqsave(>tasks_lock, flags); nbd->task_recv = NULL; + spin_unlock_irqrestore(>tasks_lock, flags); if (signal_pending(current)) { siginfo_t info; @@ -541,8 +546,11 @@ { struct nbd_device *nbd = data; struct request *req; + unsigned long flags; + spin_lock_irqsave(>tasks_lock, flags); nbd->task_send = current; + spin_unlock_irqrestore(>tasks_lock, flags); set_user_nice(current, MIN_NICE); while (!kthread_should_stop() || !list_empty(>waiting_queue)) { @@ -577,7 +585,15 @@ nbd_handle_req(nbd, req); } + spin_lock_irqsave(>tasks_lock, flags); nbd->task_send = NULL; + spin_unlock_irqrestore(>tasks_lock, flags); + + /* Clear maybe pending signals */ + if (signal_pending(current)) { + siginfo_t info; + dequeue_signal_lock(current, >blocked, ); + } return 0; } @@ -902,6 +918,7 @@ nbd_dev[i].magic = NBD_MAGIC; INIT_LIST_HEAD(_dev[i].waiting_queue); spin_lock_init(_dev[i].queue_lock); + spin_lock_init(_dev[i].tasks_lock); INIT_LIST_HEAD(_dev[i].queue_head); mutex_init(_dev[i].tx_lock); init_timer(_dev[i].timeout_timer); diff --git a/include/linux/nbd.h b/include/linux/nbd.h --- a/include/linux/nbd.h +++ b/include/linux/nbd.h @@ -43,6 +43,7 @@ int disconnect; /* a disconnect has been requested by user */ struct timer_list timeout_timer; + spinlock_t tasks_lock; struct task_struct *task_recv; struct task_struct *task_send; };
Bug#770479: [PATCH] nbd: Fix timeout detection
Hello Ben, On Mon, Aug 17, 2015 at 09:36:08PM +0200, Hermann Lauer wrote: > attached is the email from the nbd Maintainer to Jens Axboe and the kernel > list > which fixes the timeout issue. the patch contained in the email mentioned above is meanwhile contained in the vanilla kernel (checked in v4.3-rc2 today) and is functioning as expected on stable 3.16.7-ckt11-1.1 (2015-07-16) x86_64. There has been no unusual messages in the kernel log during a disconnection which happened (server process restart during update when Wouter fixed the nbd-server access code). If more testing or something else is needed, please tell me. Thanks, greetings Hermann -- Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224 Email: hermann.la...@iwr.uni-heidelberg.de
Bug#770479: [PATCH] nbd: Fix timeout detection
Hello Ben, attached is the email from the nbd Maintainer to Jens Axboe and the kernel list which fixes the timeout issue. Thanks for careing, greetings Hermann -- Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224 Email: hermann.la...@iwr.uni-heidelberg.de ---BeginMessage--- At the moment the nbd timeout just detects hanging tcp operations. This is not enough to detect a hanging or bad connection as expected of a timeout. This patch redesigns the timeout detection to include some more cases. The timeout is now in relation to replies from the server. If the server does not send replies within the timeout the connection will be shut down. The patch adds a continous timer 'timeout_timer' that is setup in one of two cases: - The request list is empty and we are sending the first request out to the server. We want to have a reply within the given timeout, otherwise we consider the connection to be dead. - A server response was received. This means the server is still communicating with us. The timer is reset to the timeout value. The timer is not stopped if the list becomes empty. It will just trigger a timeout which will directly leave the handling routine again as the request list is empty. The whole patch does not use any additional explicit locking. The list_empty() calls are safe to be used concurrently. The timer is locked internally as we just use mod_timer and del_timer_sync(). The patch is based on the idea of Michal Belczyk with a previous different implementation. Cc: Michal Belczyk belc...@bsd.krakow.pl Cc: Hermann Lauer hermann.la...@iwr.uni-heidelberg.de Signed-off-by: Markus Pargmann m...@pengutronix.de Tested-by: Hermann Lauer hermann.la...@iwr.uni-heidelberg.de --- drivers/block/nbd.c | 98 ++--- 1 file changed, 70 insertions(+), 28 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 0e385d8e9b86..7b9ae7a65c1e 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -59,6 +59,10 @@ struct nbd_device { pid_t pid; /* pid of nbd-client, if attached */ int xmit_timeout; int disconnect; /* a disconnect has been requested by user */ + + struct timer_list timeout_timer; + struct task_struct *task_recv; + struct task_struct *task_send; }; #define NBD_MAGIC 0x68797548 @@ -121,6 +125,7 @@ static void sock_shutdown(struct nbd_device *nbd, int lock) dev_warn(disk_to_dev(nbd-disk), shutting down socket\n); kernel_sock_shutdown(nbd-sock, SHUT_RDWR); nbd-sock = NULL; + del_timer_sync(nbd-timeout_timer); } if (lock) mutex_unlock(nbd-tx_lock); @@ -128,11 +133,23 @@ static void sock_shutdown(struct nbd_device *nbd, int lock) static void nbd_xmit_timeout(unsigned long arg) { - struct task_struct *task = (struct task_struct *)arg; + struct nbd_device *nbd = (struct nbd_device *)arg; + struct task_struct *task; + + if (list_empty(nbd-queue_head)) + return; + + nbd-disconnect = 1; + + task = READ_ONCE(nbd-task_recv); + if (task) + force_sig(SIGKILL, task); - printk(KERN_WARNING nbd: killing hung xmit (%s, pid: %d)\n, - task-comm, task-pid); - force_sig(SIGKILL, task); + task = READ_ONCE(nbd-task_send); + if (task) + force_sig(SIGKILL, nbd-task_send); + + dev_err(nbd_to_dev(nbd), Connection timed out, killed receiver and sender, shutting down connection\n); } /* @@ -171,33 +188,12 @@ static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size, msg.msg_controllen = 0; msg.msg_flags = msg_flags | MSG_NOSIGNAL; - if (send) { - struct timer_list ti; - - if (nbd-xmit_timeout) { - init_timer(ti); - ti.function = nbd_xmit_timeout; - ti.data = (unsigned long)current; - ti.expires = jiffies + nbd-xmit_timeout; - add_timer(ti); - } + if (send) result = kernel_sendmsg(sock, msg, iov, 1, size); - if (nbd-xmit_timeout) - del_timer_sync(ti); - } else + else result = kernel_recvmsg(sock, msg, iov, 1, size, msg.msg_flags); - if (signal_pending(current)) { - siginfo_t info; - printk(KERN_WARNING nbd (pid %d: %s) got signal %d\n, - task_pid_nr(current), current-comm
Bug#770479: linux-image-3.16.0-4-amd64: nbd timeout option is not working
Hello Ben, On Mon, Apr 06, 2015 at 07:42:32PM +0100, Ben Hutchings wrote: Unfortunately this change has still not been committed by the nbd maintainers (at least, not anywhere that I can see) and we will not apply it until that happens. Perhaps you can remind them that this problem is still unfixed? did that and now he told me, that the patch is queued for the 4.3 kernel. If you still can't see it, I'll try to work it out. I'm increasing severity since this is a regression. Thanks for that. I tested today the patch on the current jessie kernel (3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1.1 (2015-07-16) x86_64 GNU/Linux) and the timeout test with disconnected network works fine. Thanks again, greetings Hermann -- Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224 Email: hermann.la...@iwr.uni-heidelberg.de -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150717102245.gb6...@lemon.iwr.uni-heidelberg.de
Bug#770479: [PATCH] nbd: Fix timeout detection
Status update from the nbd-maintainer: Again timeout works as expected. So you may add a Tested-By: hermann.la...@iwr.uni-heidelberg.de if your patch is not already on the way upstream. Thanks for testing. The patch is not yet on its way upstream. I want to make some more tests. Appended is the version of the patch from him which applies to jessies kernel. A test with that patched kernel shows the timeout issue is fixed and the md raid1 device breaks up as expected: [ 2269.104121] block nbd0: Attempted send on closed socket [ 2269.109984] end_request: I/O error, dev nbd0, sector 0 [ 2269.115718] Buffer I/O error on device nbd0, logical block 0 [ 2269.122034] Buffer I/O error on device nbd0, logical block 1 [ 2269.128347] Buffer I/O error on device nbd0, logical block 2 [ 2269.134663] Buffer I/O error on device nbd0, logical block 3 [ 2269.140979] Buffer I/O error on device nbd0, logical block 4 [ 2269.147295] Buffer I/O error on device nbd0, logical block 5 [ 2269.153610] Buffer I/O error on device nbd0, logical block 6 [ 2269.159926] Buffer I/O error on device nbd0, logical block 7 [ 2269.166241] Buffer I/O error on device nbd0, logical block 8 [ 2269.172555] Buffer I/O error on device nbd0, logical block 9 [ 2269.178889] block nbd0: Attempted send on closed socket [ 2269.184726] end_request: I/O error, dev nbd0, sector 0 [ 2269.190470] block nbd0: Attempted send on closed socket [ 2269.196303] end_request: I/O error, dev nbd0, sector 2 [ 2269.202044] block nbd0: Attempted send on closed socket [ 2269.207876] end_request: I/O error, dev nbd0, sector 4 [ 2269.213617] block nbd0: Attempted send on closed socket [ 2269.219450] end_request: I/O error, dev nbd0, sector 6 [ 2308.414794] block nbd0: NBD_DISCONNECT [ 2308.737866] nbd0: unknown partition table Greetings Hermann -- Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224 Email: hermann.la...@iwr.uni-heidelberg.de From: Markus Pargmann m...@pengutronix.de To: Hermann Lauer hermann.la...@iwr.uni-heidelberg.de Cc: nbd-gene...@lists.sourceforge.net, linux-ker...@vger.kernel.org, ker...@pengutronix.de, Markus Pargmann m...@pengutronix.de, Michal Belczyk belc...@bsd.krakow.pl Subject: [PATCH v4.0 for testing] nbd: Fix timeout detection Date: Fri, 24 Apr 2015 09:35:33 +0200 At the moment the nbd timeout just detects hanging tcp operations. This is not enough to detect a hanging or bad connection as expected of a timeout. This patch redesigns the timeout detection to include some more cases. The timeout is now in relation to replies from the server. If the server does not send replies within the timeout the connection will be shut down. The patch adds a continous timer 'timeout_timer' that is setup in one of two cases: - The request list is empty and we are sending the first request out to the server. We want to have a reply within the given timeout, otherwise we consider the connection to be dead. - A server response was received. This means the server is still communicating with us. The timer is reset to the timeout value. The timer is not stopped if the list becomes empty. It will just trigger a timeout which will directly leave the handling routine again as the request list is empty. The whole patch does not use any additional explicit locking. The list_empty() calls are safe to be used concurrently. The timer is locked internally as we just use mod_timer and del_timer_sync(). The patch is based on the idea of Michal Belczyk with a previous different implementation. Cc: Michal Belczyk belc...@bsd.krakow.pl Cc: Hermann Lauer hermann.la...@iwr.uni-heidelberg.de Signed-off-by: Markus Pargmann m...@pengutronix.de [mpa: Backported to 4.0] Signed-off-by: Markus Pargmann m...@pengutronix.de --- This patch is only for testing at the moment. I will backport the final patch for the stable tree at the end. drivers/block/nbd.c | 94 + include/linux/nbd.h | 4 +++ 2 files changed, 70 insertions(+), 28 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index a98c41f72c63..42c7601e91ee 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -133,6 +133,7 @@ static void sock_shutdown(struct nbd_device *nbd, int lock) dev_warn(disk_to_dev(nbd-disk), shutting down socket\n); kernel_sock_shutdown(nbd-sock, SHUT_RDWR); nbd-sock = NULL; + del_timer_sync(nbd-timeout_timer); } if (lock) mutex_unlock(nbd-tx_lock); @@ -140,11 +141,23 @@ static void sock_shutdown(struct nbd_device *nbd, int lock) static void nbd_xmit_timeout(unsigned long arg) { - struct task_struct *task = (struct task_struct *)arg; + struct nbd_device *nbd = (struct nbd_device *)arg; + struct task_struct *task; - printk(KERN_WARNING nbd: killing hung xmit (%s, pid: %d)\n, - task-comm, task-pid); - force_sig(SIGKILL, task
Bug#768962: linux-image-3.2.0-4-sparc64-smp: [sparc] wheezy kernel hangs when loading i2c during boot
Package: src:linux Version: 3.2.63-2+deb7u1 Severity: normal Dear Maintainer, a regression in the hopefully still maintained wheezy sparc kernel: The older version linux-image-3.2.0-3-sparc64-smp boots without problems. This is a hard hang, break did not bring a prom prompt and a reset issued from the rsc is not working. Only powering off from rsc works after around 10 sec delay. Please ask if you need other information, will try a vanilla 3.17.2 kernel when the machine reboots the next time. Thanks and greetings Hermann -- console log Sun Fire 480R, No Keyboard Copyright 2005 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.17.1, 14336 MB memory installed, Serial #xxx [0.00] PROMLIB: Sun IEEE Boot Prom 'OBP 4.17.1 2005/04/11 14:27' [0.00] PROMLIB: Root node compatible: [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Linux version 3.2.0-4-sparc64-smp (debian-kernel@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.63-2+deb7u1 [0.00] bootconsole [earlyprom0] enabled [0.00] ARCH: SUN4U [0.00] Ethernet address: 00:03:ba:2c:de:50 [0.00] Kernel: Using 2 locked TLB entries for main kernel image. [0.00] Remapping the kernel... done. [0.00] OF stdout device is: /pci@9,70/ebus@1/serial@1,40:a [0.00] PROM: Built device tree with 107523 bytes of memory. [0.00] Top of RAM: 0xb17fb38000, Total RAM: 0x37fb1e000 [0.00] Memory hole size: 712704MB [0.00] Zone PFN ranges: [0.00] Normal 0x0500 - 0x058bfd9c [0.00] Movable zone start PFN for each node [0.00] early_node_map[6] active PFN ranges [0.00] 0: 0x0500 - 0x0510 [0.00] 0: 0x0580 - 0x058bf7ff [0.00] 0: 0x058bf800 - 0x058bfd17 [0.00] 0: 0x058bfd19 - 0x058bfd83 [0.00] 0: 0x058bfd8b - 0x058bfd99 [0.00] 0: 0x058bfd9b - 0x058bfd9c [0.00] Booting Linux... [0.00] CPU CAPS: [flush,stbar,swap,muldiv,v9,ultra3,mul32,div32] [0.00] CPU CAPS: [v8plus,vis,vis2] [0.00] PERCPU: Embedded 6 pages/cpu @f8a00740 s20416 r8192 d20544 u1048576 [0.00] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 1762707 [0.00] Kernel command line: root=/dev/md0 ro [0.00] PID hash table entries: 4096 (order: 2, 32768 bytes) [0.00] Dentry cache hash table entries: 2097152 (order: 11, 16777216 bytes) [0.00] Inode-cache hash table entries: 1048576 (order: 10, 8388608 bytes) [0.00] Memory: 14518232k available (3600k kernel code, 1488k data, 232k init) [f800,00b17fb38000] [0.00] Hierarchical RCU implementation. [0.00] CONFIG_RCU_FANOUT set to non-default value of 32 [0.00] RCU dyntick-idle grace-period acceleration is enabled. [0.00] NR_IRQS:255 [0.00] clocksource: mult[6400] shift[24] [0.00] clockevent: mult[28f5c29] shift[32] [0.00] Console: colour dummy device 80x25 [0.00] console [tty0] enabled, bootconsole disabled [0.00] PROMLIB: Sun IEEE Boot Prom 'OBP 4.17.1 2005/04/11 14:27' [0.00] PROMLIB: Root node compatible: [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Linux version 3.2.0-4-sparc64-smp (debian-kernel@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.63-2+deb7u1 [0.00] bootconsole [earlyprom0] enabled [0.00] ARCH: SUN4U [0.00] Ethernet address: 00:03:ba:2c:de:50 [0.00] Kernel: Using 2 locked TLB entries for main kernel image. [0.00] Remapping the kernel... done. [0.00] OF stdout device is: /pci@9,70/ebus@1/serial@1,40:a [0.00] PROM: Built device tree with 107523 bytes of memory. [0.00] Top of RAM: 0xb17fb38000, Total RAM: 0x37fb1e000 [0.00] Memory hole size: 712704MB [0.00] Zone PFN ranges: [0.00] Normal 0x0500 - 0x058bfd9c [0.00] Movable zone start PFN for each node [0.00] early_node_map[6] active PFN ranges [0.00] 0: 0x0500 - 0x0510 [0.00] 0: 0x0580 - 0x058bf7ff [0.00] 0: 0x058bf800 - 0x058bfd17 [0.00] 0: 0x058bfd19 - 0x058bfd83 [0.00] 0: 0x058bfd8b - 0x058bfd99 [0.00] 0: 0x058bfd9b - 0x058bfd9c [0.00] Booting Linux... [0.00] CPU CAPS: [flush,stbar,swap,muldiv,v9,ultra3,mul32,div32] [0.00] CPU CAPS: [v8plus,vis,vis2] [0.00] PERCPU: Embedded 6 pages/cpu @f8a00740 s20416 r8192 d20544 u1048576 [0.00] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 1762707 [0.00] Kernel command line: root=/dev/md0 ro [0.00] PID hash table entries: 4096 (order: 2, 32768 bytes) [0.00] Dentry cache
Bug#516785: Bug #516785: linux-image-2.6.26-1-sparc64-smp: [sparc] SunFire480R cassini network driver kernel panic
On Tue, Jul 09, 2013 at 05:42:20PM +0200, Moritz Muehlenhoff wrote: No, a second machine of the same type is available now for testing - and also crashing after loading of the cassini driver. Here lspci and cpuinfo: ... 0002:00:02.0 Ethernet controller: Oracle Corporation Cassini 10/100/1000 (rev 11) 0003:00:01.0 Ethernet controller: Oracle Corporation Cassini 10/100/1000 (rev 11) ... Does this work with the wheezy release or later kernels? Nope, tried 3.10.0 today - network worked for a short time, then a Hardware FATAL RESET occured. Last suspicion was a chip issue with rev 11 cassini - there is one working report with rev 20 chips only. For the records: this was a 4 CPU 480R. As usual console output is saved and could be provided, any other ideas are welcome. Thanks, Hermann -- Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224 Email: hermann.la...@iwr.uni-heidelberg.de -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130723151912.ge1...@lemon.iwr.uni-heidelberg.de
Bug#516785: Bug #516785: linux-image-2.6.26-1-sparc64-smp: [sparc] SunFire480R cassini network driver kernel panic
On Mon, Jun 04, 2012 at 04:35:57PM +0200, Hermann Lauer wrote: On Sat, Jun 02, 2012 at 03:57:54AM +0800, Aron Xu wrote: I have remote ssh access (root) to that running SunFire 408R, what can I do to help you? ... PS: I've disabled the rename function of udev and set hwaddress in /etc/network/interfaces directly to work around the always changing mac address. How to disable the renaming ? Will try to set the hwaddr during the next test. ... Wondering now if having only one cpu board with 2 cpus may be the problem. No, a second machine of the same type is available now for testing - and also crashing after loading of the cassini driver. Here lspci and cpuinfo: :00:06.0 IDE interface: Silicon Image, Inc. PCI0646 (rev 07) 0002:00:01.0 Bridge: Oracle Corporation RIO EBUS (rev 01) 0002:00:01.3 USB Controller: Oracle Corporation RIO USB (rev 01) 0002:00:02.0 Ethernet controller: Oracle Corporation Cassini 10/100/1000 (rev 11) 0003:00:01.0 Ethernet controller: Oracle Corporation Cassini 10/100/1000 (rev 11) 0003:00:02.0 SCSI storage controller: QLogic Corp. QLA2200 64-bit Fibre Channel Adapter (rev 05) cpu : TI UltraSparc III+ (Cheetah+) fpu : UltraSparc III+ integrated FPU pmu : ultra3+ prom: OBP 4.17.1 2005/04/11 14:27 type: sun4u ncpus probed: 4 ncpus active: 4 D$ parity tl1 : 0 I$ parity tl1 : 0 cpucaps : flush,stbar,swap,muldiv,v9,ultra3,mul32,div32,v8plus,vis,vis2 Cpu0ClkTck : 35a4e900 Cpu1ClkTck : 35a4e900 Cpu2ClkTck : 35a4e900 Cpu3ClkTck : 35a4e900 MMU Type: Cheetah+ State: CPU0: online CPU1: online CPU2: online CPU3: online This machine has 24G RAM, 16 on one board and 8 on the other. The first machine has one board with 16G. So it may be a memory issue, as Aron has only 14G. Other maybe the OBP version - here is the latest installed on all machines. Just to rule out a firmware issue: $ md5sum /lib/firmware/sun/cassini.bin fd11e09e8e61694353f12b3de376292a Any further ideas to debug deeper ? Thanks, Hermann -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120627083124.ga7...@lemon.iwr.uni-heidelberg.de
Bug#665932: strace from vanilla kernel 3.2.17
See strace from vanilla 3.2.17 below, hope that helps. Greetings Hermann # strace lshw execve(/usr/bin/lshw, [lshw], [/* 16 vars */]) = 0 brk(0) = 0xa6000 uname({sys=Linux, node=tantalus, ...}) = 0 access(/etc/ld.so.nohwcap, F_OK) = -1 ENOENT (No such file or directory) mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf7fcc000 access(/etc/ld.so.preload, R_OK) = -1 ENOENT (No such file or directory) open(/etc/ld.so.cache, O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=28853, ...}) = 0 mmap(NULL, 28853, PROT_READ, MAP_PRIVATE, 3, 0) = 0xf7fc4000 close(3)= 0 access(/etc/ld.so.nohwcap, F_OK) = -1 ENOENT (No such file or directory) open(/usr/lib/libstdc++.so.6, O_RDONLY) = 3 read(3, \177ELF\1\2\1\0\0\0\0\0\0\0\0\0\0\3\0\22\0\0\0\1\0\5(\0\0\0\0004..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0644, st_size=1094428, ...}) = 0 mmap(NULL, 1184336, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xf7e78000 mprotect(0xf7f7e000, 57344, PROT_NONE) = 0 mmap(0xf7f8c000, 32768, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x104000) = 0xf7f8c000 mmap(0xf7f94000, 21072, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xf7f94000 close(3)= 0 access(/etc/ld.so.nohwcap, F_OK) = -1 ENOENT (No such file or directory) open(/lib/libgcc_s.so.1, O_RDONLY)= 3 read(3, \177ELF\1\2\1\0\0\0\0\0\0\0\0\0\0\3\0\22\0\0\0\1\0\0!\340\0\0\0004..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0644, st_size=68880, ...}) = 0 mmap(NULL, 133496, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xf7e54000 mprotect(0xf7e66000, 57344, PROT_NONE) = 0 mmap(0xf7e74000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1) = 0xf7e74000 close(3)= 0 access(/etc/ld.so.nohwcap, F_OK) = -1 ENOENT (No such file or directory) open(/lib/ultra3/libc.so.6, O_RDONLY) = 3 read(3, \177ELF\1\2\1\0\0\0\0\0\0\0\0\0\0\3\0\22\0\0\0\1\0\2\7\340\0\0\0004..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=1501772, ...}) = 0 mmap(NULL, 1572328, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xf7cd4000 mprotect(0xf7e3c000, 65536, PROT_NONE) = 0 mmap(0xf7e4c000, 24576, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x168000) = 0xf7e4c000 mmap(0xf7e52000, 7656, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xf7e52000 close(3)= 0 access(/etc/ld.so.nohwcap, F_OK) = -1 ENOENT (No such file or directory) open(/lib/ultra3/libm.so.6, O_RDONLY) = 3 read(3, \177ELF\1\2\1\0\0\0\0\0\0\0\0\0\0\3\0\22\0\0\0\1\0\0\335`\0\0\0004..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0644, st_size=867768, ...}) = 0 mmap(NULL, 931744, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xf7bf mprotect(0xf7cc, 57344, PROT_NONE) = 0 mmap(0xf7cce000, 24576, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xce000) = 0xf7cce000 close(3)= 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf7fc2000 mprotect(0xf7cce000, 8192, PROT_READ) = 0 mprotect(0xf7e4c000, 8192, PROT_READ) = 0 mprotect(0xf7f8c000, 16384, PROT_READ) = 0 mprotect(0xf7fce000, 8192, PROT_READ) = 0 munmap(0xf7fc4000, 28853) = 0 brk(0) = 0xa6000 brk(0xc8000)= 0xc8000 access(/sys/class/., F_OK)= 0 geteuid32() = 0 uname({sys=Linux, node=tantalus, ...}) = 0 ioctl(2, TCSETAF or SNDCTL_TMR_SELECT, {B38400 opost isig icanon echo ...}) = 0 ) = 1 ) = 5 open(/dev/mem, O_RDONLY) = 3 open(/proc/efi/systab, O_RDONLY) = -1 ENOENT (No such file or directory) mmap(NULL, 32, PROT_READ, MAP_SHARED, 3, 0xe) = 0xf7fc8000 Message from syslogd@tantalus at Wed Jun 27 11:49:41 2012 ... tantalus kernel: Press Stop-A (L1-A) to return to the boot prom Message from syslogd@tantalus at Wed Jun 27 11:49:41 2012 ... tantalus kernel: Kernel panic - not syncing: Irrecoverable deferred error trap. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120627083449.gb7...@lemon.iwr.uni-heidelberg.de
Bug#516785: Bug #516785: linux-image-2.6.26-1-sparc64-smp: [sparc] SunFire480R cassini network driver kernel panic
On Sat, Jun 02, 2012 at 03:57:54AM +0800, Aron Xu wrote: I have remote ssh access (root) to that running SunFire 408R, what can I do to help you? ... PS: I've disabled the rename function of udev and set hwaddress in /etc/network/interfaces directly to work around the always changing mac address. How to disable the renaming ? Will try to set the hwaddr during the next test. Appended is the lspci and /proc/cpuinfo output. Using the 2.6.32-45 squeeze default image just did hang the machine when loading the cassini drivers again. Wondering now if having only one cpu board with 2 cpus may be the problem. Many thanks for your help, Hermann :00:03.0 SCSI storage controller: QLogic Corp. QLA2200 64-bit Fibre Channel Adapter (rev 05) :00:06.0 IDE interface: Silicon Image, Inc. PCI0646 (rev 07) 0001:00:01.0 PCI bridge: Digital Equipment Corporation DECchip 21154 (rev 05) 0001:00:02.0 Ethernet controller: Oracle Corporation GEM 10/100/1000 Ethernet [ge] (rev 01) 0001:01:04.0 SCSI storage controller: QLogic Corp. QLA2200 64-bit Fibre Channel Adapter (rev 05) 0001:01:05.0 SCSI storage controller: QLogic Corp. QLA2200 64-bit Fibre Channel Adapter (rev 05) 0002:00:01.0 Bridge: Oracle Corporation RIO EBUS (rev 01) 0002:00:01.3 USB Controller: Oracle Corporation RIO USB (rev 01) 0002:00:02.0 Ethernet controller: Oracle Corporation Cassini 10/100/1000 (rev 11) 0003:00:01.0 Ethernet controller: Oracle Corporation Cassini 10/100/1000 (rev 11) 0003:00:02.0 SCSI storage controller: QLogic Corp. QLA2200 64-bit Fibre Channel Adapter (rev 05) cpu : TI UltraSparc III+ (Cheetah+) fpu : UltraSparc III+ integrated FPU pmu : ultra3+ prom: OBP 4.22.34 2007/07/23 13:01 type: sun4u ncpus probed: 2 ncpus active: 2 D$ parity tl1 : 0 I$ parity tl1 : 0 cpucaps : flush,stbar,swap,muldiv,v9,ultra3,mul32,div32,v8plus,vis,vis2 Cpu0ClkTck : 35a4e900 Cpu2ClkTck : 35a4e900 MMU Type: Cheetah+ State: CPU0: online CPU2: online -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120604143557.gi2...@lemon.iwr.uni-heidelberg.de
Bug#516785: Bug #516785: linux-image-2.6.26-1-sparc64-smp: [sparc] SunFire480R cassini network driver kernel panic
On Tue, Mar 27, 2012 at 03:22:38PM +0100, Ben Hutchings wrote: On Tue, 2012-03-27 at 15:42 +0800, Aron Xu wrote: I can confirm that Debian Squeeze 6.0.4, with kernel linux-image-2.6.32-5-sparc64-smp, version 2.6.32-41 or 2.6.32-41squeeze2, does not crash anymore. The installation process is Well I can't see any changes that might have fixed this. Maybe there's a difference between your machine and Hermann's? Tried today vanilla 3.4.0 and 3.3.7: 3.4.0 crashes most probable unrelated in the md code, on 3.3.7 setting up the cassini driver hangs the machine and afer a while it resets itself, see below. Aron, do you have a Sun Fire 480R ? If yes, I'm interested in getting a running binary kernel from you to rule out configuration and compiler issues. Thanks, Hermann tantalus:~# modprobe -v cassini cassini_debug=-1 WARNING: All config files need .conf: cassini: cassini.c:v1.6 (21 May 2008) /etc/modprobe.d/local, it will be ignored in a cassini 0002:00:02.0: eth0: Sun Cassini+ (64bit/33MHz PCI/Cu) Ethernet[24] 00:03:ba:29:7c:a0 future release. insmod /lib/modules/3.3.7/kernel/drivers/net/ethernet/sun/cassini.ko cassini_debug=-1 cassini 0003:00:01.0: eth1: Sun Cassini+ (64bit/66MHz PCI/Cu) Ethernet[30] 00:03:ba:29:7c:9f tantudev[913]: renamed network interface eth0 to eth19 alus:~# udev[914]: renamed network interface eth1 to eth20 tantalus:~# ifconfig eth19 129.206.xxx.xxx netmask 255.255.255.0 broadcast 129.206.xxx.255 up cassini 0002:00:02.0: eth19: Link up at 1000 Mbps, full-duplex cassini 0002:00:02.0: eth19: TX pause enabled tantalus:~# route add default gw 129.206.xxx.xxx tantalus:~# Sun Fire 480R, No Keyboard Copyright 2007 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.22.34, 16384 MB memory installed, Serial # -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120601150546.gh29...@lemon.iwr.uni-heidelberg.de
Bug#516785: Bug #516785: linux-image-2.6.26-1-sparc64-smp: [sparc] SunFire480R cassini network driver kernel panic
On Tue, Mar 27, 2012 at 03:22:38PM +0100, Ben Hutchings wrote: On Tue, 2012-03-27 at 15:42 +0800, Aron Xu wrote: Hi, I can confirm that Debian Squeeze 6.0.4, with kernel linux-image-2.6.32-5-sparc64-smp, version 2.6.32-41 or 2.6.32-41squeeze2, does not crash anymore. The installation process is smooth (d-i prompts for a firmware), and the system is working well. But don't run lshw with this kernel, it may cause panic (#665932). Well I can't see any changes that might have fixed this. Maybe there's a difference between your machine and Hermann's? Hermann, what was the last kernel version where the cassini driver worked on this system? You originally reported that the problem started with 2.6.24 in 'etch-and-a-half'. The short answer is: never, the driver always crashes the machine after a short time. As far as I remember with discussions from davem the driver only worked on UP machines, which I can't simulate with the Sun Fire 480R as it has 2 CPU/board. A dual cassini Gigabit RJ45 is build in. I'm on vanilla 3.2.12 at the moment and will test after easter in the (not so much) spare time. If anybody with such a machine has it running I'm interested to hear. Thanks, Hermann -- Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224 Email: hermann.la...@iwr.uni-heidelberg.de -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120328073811.ga20...@lemon.iwr.uni-heidelberg.de
Bug#516785: [sparc] SunFire480R cassini network driver kernel panic
On Tue, Aug 09, 2011 at 06:16:25AM +0200, Jonathan Nieder wrote: It seems that the pata_cmd64x bug has been fixed. cassini has received some fixes since 2.6.26, too, so if you get a chance to test 3.0.0-1, that would be interesting. compiled and tested cassini in vanilla 3.0: Still crashing after ~10 pings with an ERROR: System Hardware FATAL RESET from CPU0 The kernel messages while inserting the cassini driver are appendend. Please ask if you like to see the complete register dump after the crash (or any other things). Compiling vanilla 3.0.1 with 3.0.0 and will test for completeness. Is 3.0.0-1 already in the sid sparc repo ? Thanks for the information and any help, Hermann Aug 9 16:26:56 tantalus kernel: cassini: cassini.c:v1.6 (21 May 2008) Aug 9 16:26:56 tantalus kernel: PCI: Enabling device: (0002:00:02.0), cmd 146 Aug 9 16:26:56 tantalus kernel: cassini 0002:00:02.0: eth0: Sun Cassini+ (64bit/33MHz PCI/Cu) Ethernet[24] 00:03:ba:29:7c:a0 Aug 9 16:26:56 tantalus kernel: PCI: Enabling device: (0003:00:01.0), cmd 146 Aug 9 16:26:57 tantalus kernel: cassini 0003:00:01.0: eth1: Sun Cassini+ (64bit/66MHz PCI/Cu) Ethernet[30] 00:03:ba:29:7c:9f Aug 9 16:26:57 tantalus kernel: udev[917]: renamed network interface eth0 to eth19 Aug 9 16:26:58 tantalus kernel: udev[918]: renamed network interface eth1 to eth20 -- Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224 Email: hermann.la...@iwr.uni-heidelberg.de -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20110809144347.ga2...@lemon.iwr.uni-heidelberg.de
Bug#622745: linux-image-2.6-sparc64-smp: squeeze provided kernels did not boot on Sun Fire 480R or 880
Package: linux-image-2.6-sparc64-smp Version: 2.6.38+33 Severity: critical Tags: d-i upstream Justification: breaks the whole system As discussed in http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=516785, there is no working kernel in squeeze available for this class of machines. vanilla 2.6.38.2 compiles and runs (with minor glitches) on 480R and 880. After Moritz's hint in #516785 appended is the result of a boot try of 2.6.38 from wheezy. Thanks, Hermann -- System Information: Debian Release: 6.0.1 APT prefers stable APT policy: (500, 'stable'), (50, 'testing') Architecture: sparc (sparc64) Kernel: Linux 2.6.38.2 (SMP w/1 CPU core) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages linux-image-2.6-sparc64-smp depends on: ii linux-image-2.6.38-2-sparc64- 2.6.38-3 Linux 2.6.38 for multiprocessor 64 linux-image-2.6-sparc64-smp recommends no packages. linux-image-2.6-sparc64-smp suggests no packages. -- no debconf information [0.00] PROMLIB: Sun IEEE Boot Prom 'OBP 4.22.34 2007/07/23 13:01' [0.00] PROMLIB: Root node compatible: [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Linux version 2.6.38-2-sparc64-smp (Debian 2.6.38-3) (b...@decadent.org.uk) (gcc version 4.4.5 (Debian 4.4.5-15) ) #1 SMP Thu Apr 7 06:53:29 UTC 2011 [0.00] bootconsole [earlyprom0] enabled [0.00] ARCH: SUN4U [0.00] Ethernet address: 00:03:ba:29:7c:9f [0.00] Kernel: Using 2 locked TLB entries for main kernel image. [0.00] Remapping the kernel... done. [0.00] OF stdout device is: /pci@9,70/ebus@1/serial@1,40:a [0.00] PROM: Built device tree with 103627 bytes of memory. [0.00] Top of RAM: 0xa3ffb22000, Total RAM: 0x3ffb0e000 [0.00] Memory hole size: 655360MB [0.00] [01014000-f8a000c0] page_structs=131072 node=0 entry=1280/8192 [0.00] [01014000-f8a00100] page_structs=131072 node=0 entry=1281/8192 [0.00] [01014080-f8a00140] page_structs=131072 node=0 entry=1282/8192 [0.00] [01014080-f8a00180] page_structs=131072 node=0 entry=1283/8192 [0.00] [01014100-f8a001c0] page_structs=131072 node=0 entry=1284/8192 [0.00] [01014100-f8a00200] page_structs=131072 node=0 entry=1285/8192 [0.00] [01014180-f8a00240] page_structs=131072 node=0 entry=1286/8192 [0.00] [01014180-f8a00280] page_structs=131072 node=0 entry=1287/8192 [0.00] [01014200-f8a002c0] page_structs=131072 node=0 entry=1288/8192 [0.00] [01014200-f8a00300] page_structs=131072 node=0 entry=1289/8192 [0.00] [01014280-f8a00340] page_structs=131072 node=0 entry=1290/8192 [0.00] [01014280-f8a00380] page_structs=131072 node=0 entry=1291/8192 [0.00] [01014300-f8a003c0] page_structs=131072 node=0 entry=1292/8192 [0.00] [01014300-f8a00400] page_structs=131072 node=0 entry=1293/8192 [0.00] [01014380-f8a00440] page_structs=131072 node=0 entry=1294/8192 [0.00] [01014380-f8a00480] page_structs=131072 node=0 entry=1295/8192 [0.00] [01014400-f8a004c0] page_structs=131072 node=0 entry=1296/8192 [0.00] [01014400-f8a00500] page_structs=131072 node=0 entry=1297/8192 [0.00] [01014480-f8a00540] page_structs=131072 node=0 entry=1298/8192 [0.00] [01014480-f8a00580] page_structs=131072 node=0 entry=1299/8192 [0.00] [01014500-f8a005c0] page_structs=131072 node=0 entry=1300/8192 [0.00] [01014500-f8a00600] page_structs=131072 node=0 entry=1301/8192 [0.00] [01014580-f8a00640] page_structs=131072 node=0 entry=1302/8192 [0.00] [01014580-f8a00680] page_structs=131072 node=0 entry=1303/8192 [0.00] [01014600-f8a006c0] page_structs=131072 node=0 entry=1304/8192 [0.00] [01014600-f8a00700] page_structs=131072 node=0 entry=1305/8192 [0.00] [01014680-f8a00740] page_structs=131072 node=0 entry=1306/8192 [0.00] [01014680-f8a00780] page_structs=131072 node=0 entry=1307/8192 [0.00] [01014700-f8a007c0] page_structs=131072 node=0 entry=1308/8192 [0.00] [01014700-f8a00800] page_structs=131072 node=0 entry=1309/8192 [0.00] [01014780-f8a00840] page_structs=131072 node=0 entry=1310/8192 [0.00] [01014780-f8a00880] page_structs=131072 node=0 entry=1311/8192 [0.00] Zone PFN ranges: [0.00] Normal
Bug#516785: Sun Fire 480R booting now [Was: Re: linux-image-2.6.26-1-sparc64-smp: [sparc] SunFire480R cassini network driver kernel panic]
On Wed, Apr 13, 2011 at 09:01:27PM +0200, Moritz M?hlenhoff wrote: For the debian sparc list: Hoping that wheezy will get a 2.6.38 kernel. A 2.6.38 based kernel image is already available in wheezy/testing Please test it and report, if it also works. Refiled this at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=622745, as I feel general hangs should not fill up the cassini report here. Greetings Hermann -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20110414113212.gb13...@lemon.iwr.uni-heidelberg.de
Bug#516785: linux-image-2.6.26-1-sparc64-smp: [sparc] SunFire480R cassini network driver kernel panic
On Fri, Apr 01, 2011 at 09:42:06PM +0200, Moritz M?hlenhoff wrote: On Mon, Mar 01, 2010 at 12:30:05PM +0100, Hermann Lauer wrote: On Wed, Jan 13, 2010 at 09:29:25AM +0100, Hermann Lauer wrote: I tried again with vanilla 2.6.32 (2.6.27-2.6.31 are unusable due to a kernel memory corruption bug), the driver is still crashing my Sunfire V480R. Tried today vanilla 2.6.33, then did a: # modprobe -v cassini cassini_debug=-1 insmod /lib/modules/2.6.33/kernel/drivers/net/cassini.ko cassini_debug=-1 In kern.log the messages appeared: Mar 1 12:17:14 tantalus kernel: cassini.c:v1.6 (21 May 2008) Mar 1 12:17:14 tantalus kernel: PCI: Enabling device: (0002:00:02.0), cmd 146 Mar 1 12:17:15 tantalus kernel: cassini: MAC address not found in ROM VPD Mar 1 12:17:15 tantalus kernel: eth0: Sun Cassini+ (64bit/33MHz PCI/Cu) Ethernet[24] 08:00:20:cb:31:01 Mar 1 12:17:15 tantalus kernel: PCI: Enabling device: (0003:00:01.0), cmd 146 Mar 1 12:17:15 tantalus kernel: cassini: MAC address not found in ROM VPD Mar 1 12:17:15 tantalus kernel: udev: renamed network interface eth0 to eth15 Mar 1 12:17:15 tantalus kernel: eth0: Sun Cassini+ (64bit/66MHz PCI/Cu) Ethernet[30] 08:00:20:bc:c7:b7 Mar 1 12:17:15 tantalus kernel: udev: renamed network interface eth0 to eth16 Mar 1 12:20:48 tantalus kernel: eth15: Link up at 1000 Mbps, full-duplex. Mar 1 12:20:48 tantalus kernel: eth15: TX pause enabled Mar 1 12:20:59 tantalus kernel: eth15: no IPv6 routers present As before, setting up the interface with ifconfig works. After sending out 68 pings the machine crashed with the usual Hardware FATAL RESET. What can be done to debug this further ? Is this fixed in later kernels, e.g. the 2.6.38 from Debian unstable? If so and the fix can be isolated we can fix it in 2.6.32 for Squeeze. Unfortunately squeeze provided kernels are not booting on V880/V480 systems (hangs at boot, log files are available on request). Will try install boots from testing when time permits. Or are there other tftpbootable images I could/should try ? Thats the major issue here at the moment, cassini has to wait... Thanks for caring, Hermann -- Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224 Email: hermann.la...@iwr.uni-heidelberg.de -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20110404151709.gc10...@lemon.iwr.uni-heidelberg.de
Bug#516785: linux-image-2.6.26-1-sparc64-smp: [sparc] SunFire480R cassini network driver kernel panic
On Wed, Jan 13, 2010 at 09:29:25AM +0100, Hermann Lauer wrote: I tried again with vanilla 2.6.32 (2.6.27-2.6.31 are unusable due to a kernel memory corruption bug), the driver is still crashing my Sunfire V480R. Tried today vanilla 2.6.33, then did a: # modprobe -v cassini cassini_debug=-1 insmod /lib/modules/2.6.33/kernel/drivers/net/cassini.ko cassini_debug=-1 In kern.log the messages appeared: Mar 1 12:17:14 tantalus kernel: cassini.c:v1.6 (21 May 2008) Mar 1 12:17:14 tantalus kernel: PCI: Enabling device: (0002:00:02.0), cmd 146 Mar 1 12:17:15 tantalus kernel: cassini: MAC address not found in ROM VPD Mar 1 12:17:15 tantalus kernel: eth0: Sun Cassini+ (64bit/33MHz PCI/Cu) Ethernet[24] 08:00:20:cb:31:01 Mar 1 12:17:15 tantalus kernel: PCI: Enabling device: (0003:00:01.0), cmd 146 Mar 1 12:17:15 tantalus kernel: cassini: MAC address not found in ROM VPD Mar 1 12:17:15 tantalus kernel: udev: renamed network interface eth0 to eth15 Mar 1 12:17:15 tantalus kernel: eth0: Sun Cassini+ (64bit/66MHz PCI/Cu) Ethernet[30] 08:00:20:bc:c7:b7 Mar 1 12:17:15 tantalus kernel: udev: renamed network interface eth0 to eth16 Mar 1 12:20:48 tantalus kernel: eth15: Link up at 1000 Mbps, full-duplex. Mar 1 12:20:48 tantalus kernel: eth15: TX pause enabled Mar 1 12:20:59 tantalus kernel: eth15: no IPv6 routers present As before, setting up the interface with ifconfig works. After sending out 68 pings the machine crashed with the usual Hardware FATAL RESET. What can be done to debug this further ? Thanks, Hermann ...registerdump omitted... 2:0Device Type == 5 NUM OF BITS IN CSR/EMU Reg in device# 60 == 15 BITS [14..0] = Initiating Hardware FATAL RESET recovery Resetting... RSC Alert: Host System has Reset ERROR: Hardware FATAL RESET Recovery Executing POST w/%o0 = .0800.0201.4041 0:0 0:0@(#)Sun Fire[TM] V480/V490 POST 4.22.34 2007/07/23 13:12 /export/delivery/delivery/4.22/4.22.34/post4.22.x/Camelot/cstone/integrated (root) 0:0Copyright 2007 Sun Microsystems, Inc. All rights reserved -- Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224 Email: hermann.la...@iwr.uni-heidelberg.de -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100301113005.ga2...@lemon.iwr.uni-heidelberg.de
Bug#516785: linux-image-2.6.26-1-sparc64-smp: [sparc] SunFire480R cassini network driver kernel panic
On Tue, Jan 12, 2010 at 10:35:53PM +0100, Moritz Muehlenhoff wrote: The conversation with Dave petered out, was the further discussion/ investigation? I tried again with vanilla 2.6.32 (2.6.27-2.6.31 are unusable due to a kernel memory corruption bug), the driver is still crashing my Sunfire V480R. Have no time to enable debugging and test more at the moment, but will try later. Greetings Hermann -- Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224 Email: hermann.la...@iwr.uni-heidelberg.de -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#550863: linux-image-2.6.26-2-amd64: kernel did not propagate I/O errors up after nbd server disconnection
Package: linux-image-2.6.26-2-amd64 Version: 2.6.26-19 Severity: important Tags: patch Using an raid1 on top of a network block device (/dev/nbd0) with the lenny kernel is impossible, as the kernel will hang on a cat /proc/mdstat and on md accesses after the nbd-client died after a network disconnection. Wouter Verhelst wrote: (see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=549904) As I suspected, this is a kernel bug. The good news is that Paul Clements (who maintains the kernel side of NBD) already fixed it back in February; the bad news is that your kernel is too old to contain the fix. Please consider for inclusion in lenny updates: diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 34f80fa..8299e2d 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -549,6 +549,15 @@ static void do_nbd_request(struct request_queue * q) BUG_ON(lo-magic != LO_MAGIC); + if (unlikely(!lo-sock)) { + printk(KERN_ERR %s: Attempted send on closed socket\n, + lo-disk-disk_name); + req-errors++; + nbd_end_request(req); + spin_lock_irq(q-queue_lock); + continue; + } + spin_lock_irq(lo-queue_lock); list_add_tail(req-queuelist, lo-waiting_queue); spin_unlock_irq(lo-queue_lock); -- Package-specific info: ** Version: Linux version 2.6.26-2-amd64 (Debian 2.6.26-19) (da...@debian.org) (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 SMP Wed Aug 19 22:33:18 UTC 2009 ** Command line: root=/dev/cciss/c0d0p1 ro console=ttyS1,9600n8 ** Not tainted ** Kernel log: [451246.85] nbd0: Receive control failed (result -32) [451246.922870] nbd0: shutting down socket [451246.999872] nbd0: queue cleared [564563.472849] nbd: unregistered device at major 43 [564596.569648] nbd: no symbol version for struct_module [564713.513829] nbd: no symbol version for struct_module [569352.799915] nbd: no symbol version for struct_module [570364.726383] nbd: no symbol version for struct_module [570403.076045] nbd: no symbol version for struct_module [573363.954150] nbd: no symbol version for struct_module [573412.282535] nbd: no symbol version for struct_module [573698.837718] nbd: no symbol version for struct_module [573709.102217] nbd: no symbol version for struct_module [602361.639441] nbd: disagrees about version of symbol struct_module [920487.726349] nbd: registered device at major 43 [920638.283851] nbd: unregistered device at major 43 [920645.149820] nbd: disagrees about version of symbol struct_module [923450.730250] nbd: disagrees about version of symbol struct_module [929276.274395] nbd: disagrees about version of symbol struct_module [91.561760] nbd: disagrees about version of symbol struct_module [1036996.196175] nbd: registered device at major 43 [1037185.485510] nbd: unregistered device at major 43 [1037812.007327] nbd: registered device at major 43 [1037959.495816] md: bindnbd0 [1037959.588225] RAID1 conf printout: [1037959.631504] --- wd:1 rd:2 [1037959.674903] disk 0, wo:0, o:1, dev:cciss/c0d1 [1037959.767917] disk 1, wo:1, o:1, dev:nbd0 [1037959.811901] md: recovery of RAID array md1 [1037960.065164] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [1037960.134163] md: using maximum available idle IO bandwidth (but not more than 20 KB/sec) for recovery. [1037960.246181] md: using 128k window, over a total of 292935872 blocks. [1038236.501929] nbd0: Receive control failed (result -104) [1038236.562381] nbd0: shutting down socket [1038236.620862] end_request: I/O error, dev nbd0, sector 0 [1038236.664847] Buffer I/O error on device nbd0, logical block 0 [1038236.841025] Buffer I/O error on device nbd0, logical block 1 [1038236.841025] Buffer I/O error on device nbd0, logical block 2 [1038237.045035] Buffer I/O error on device nbd0, logical block 3 [1038237.045035] Buffer I/O error on device nbd0, logical block 4 [1038237.045035] Buffer I/O error on device nbd0, logical block 5 [1038237.045035] Buffer I/O error on device nbd0, logical block 6 [1038237.045035] Buffer I/O error on device nbd0, logical block 7 [1038237.045035] Buffer I/O error on device nbd0, logical block 8 [1038237.045035] Buffer I/O error on device nbd0, logical block 9 [1038238.052038] end_request: I/O error, dev nbd0, sector 28722432 [1038238.096036] raid1: Disk failure on nbd0, disabling device. [1038238.096036] raid1: Operation continuing on 1 devices. [1038238.289044] end_request: I/O error, dev nbd0, sector 28722304 [1038238.440229] nbd0: Attempted send on closed socket [1038238.440234] end_request: I/O error, dev nbd0, sector 28722176 [1038238.440242] end_request: I/O error, dev nbd0, sector 28722048 [1038238.440253] end_request: I/O error, dev nbd0, sector 28721920 [1038238.440259] end_request: I/O error, dev nbd0, sector 28721792 [1038238.440270] end_request: I/O
Bug#516785: linux-image-2.6.26-1-sparc64-smp: [sparc] SunFire480R cassini network driver kernel panic
Package: linux-image-2.6.26-1-sparc64-smp Version: 2.6.26-13 Severity: important lenny 2.6.26 kernel crashes on using the buildin cassini 1000T network ports on a 2 processor SunFire 480R. 2.6.24 etchnhalf already had the same problem See also the thread with DaveM at: http://marc.info/?l=linux-sparcm=122491191714208w=2 -- System Information: Debian Release: 5.0 APT prefers stable APT policy: (500, 'stable') Architecture: sparc (sparc64) Kernel: Linux 2.6.26-1-sparc64-smp (SMP w/1 CPU core) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages linux-image-2.6.26-1-sparc64-smp depends on: ii debconf [debconf-2.0] 1.5.24 Debian configuration management sy ii initramfs-tools [linux-initra 0.92o tools for generating an initramfs ii module-init-tools 3.4-1 tools for managing Linux kernel mo linux-image-2.6.26-1-sparc64-smp recommends no packages. Versions of packages linux-image-2.6.26-1-sparc64-smp suggests: pn fdutilsnone(no description available) pn linux-doc-2.6.26 none(no description available) ii silo 1.4.13a+git20070930-3 Sparc Improved LOader -- debconf information: linux-image-2.6.26-1-sparc64-smp/preinst/failed-to-move-modules-2.6.26-1-sparc64-smp: linux-image-2.6.26-1-sparc64-smp/prerm/would-invalidate-boot-loader-2.6.26-1-sparc64-smp: true shared/kernel-image/really-run-bootloader: true linux-image-2.6.26-1-sparc64-smp/preinst/already-running-this-2.6.26-1-sparc64-smp: linux-image-2.6.26-1-sparc64-smp/preinst/abort-install-2.6.26-1-sparc64-smp: linux-image-2.6.26-1-sparc64-smp/postinst/kimage-is-a-directory: linux-image-2.6.26-1-sparc64-smp/postinst/depmod-error-2.6.26-1-sparc64-smp: false linux-image-2.6.26-1-sparc64-smp/preinst/elilo-initrd-2.6.26-1-sparc64-smp: true linux-image-2.6.26-1-sparc64-smp/postinst/bootloader-error-2.6.26-1-sparc64-smp: linux-image-2.6.26-1-sparc64-smp/preinst/lilo-initrd-2.6.26-1-sparc64-smp: true linux-image-2.6.26-1-sparc64-smp/postinst/old-system-map-link-2.6.26-1-sparc64-smp: true linux-image-2.6.26-1-sparc64-smp/preinst/initrd-2.6.26-1-sparc64-smp: linux-image-2.6.26-1-sparc64-smp/prerm/removing-running-kernel-2.6.26-1-sparc64-smp: true linux-image-2.6.26-1-sparc64-smp/postinst/create-kimage-link-2.6.26-1-sparc64-smp: true linux-image-2.6.26-1-sparc64-smp/preinst/abort-overwrite-2.6.26-1-sparc64-smp: linux-image-2.6.26-1-sparc64-smp/postinst/old-initrd-link-2.6.26-1-sparc64-smp: true linux-image-2.6.26-1-sparc64-smp/postinst/old-dir-initrd-link-2.6.26-1-sparc64-smp: true linux-image-2.6.26-1-sparc64-smp/postinst/depmod-error-initrd-2.6.26-1-sparc64-smp: false linux-image-2.6.26-1-sparc64-smp/preinst/overwriting-modules-2.6.26-1-sparc64-smp: true linux-image-2.6.26-1-sparc64-smp/postinst/bootloader-test-error-2.6.26-1-sparc64-smp: linux-image-2.6.26-1-sparc64-smp/preinst/bootloader-initrd-2.6.26-1-sparc64-smp: true linux-image-2.6.26-1-sparc64-smp/preinst/lilo-has-ramdisk: -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org