Bug#851432: installation-reports: reboot or halt hangs after unmounting all disks on active update task
Package: installation-reports Severity: normal Dear Maintainer, *** Reporter, please consider answering these questions, where appropriate *** * What led up to the situation? ==> Any reboot or halt triggers it ==> at shutdown system goes through all needed actions, unmounts the remote and local disks, then waits for an update action to finish (which can never finish as all disks have been unmounted). The stated wait time is 15 min some seconds. (but did not wait for that). * What exactly did you do (or not do) that was effective (or ineffective)? ==> "solved" by doing a HW switch off (4 seconds power button) * What was the outcome of this action? ==> system power-off * What outcome did you expect instead? ==> reboot should reboot, and not result in long wait time. *** End of the template - remove these template lines *** -- Package-specific info: Boot method: network for install, after that local disk (SDD) Image version: netboot builddate 11 nov 2016 Date: Installed nov 2016. with all updates since. Machine: Dell precision T7500 Partitions: Base System Installation Checklist: [O] = OK, [E] = Error (please elaborate below), [ ] = didn't try it Initial boot: [O] Detect network card:[O] Configure network: [O] Detect CD: [O] Load installer modules: [O] Clock/timezone setup: [O] User/password setup:[O] Detect hard drives: [O] Partition hard drives: [O] Install base system:[O] Install tasks: [O] Install boot loader:[O] Overall install:[O] Comments/Problems: install seems OK, shutdown is KO System is normally either active or in RAM-sleep. Did not do shutdown, as system typically crashed before. When starting to debug that, and looking for other ongoing issues, the problem on shutdown was found. The ongoing issue is not related, and seems an initialisation issue of the graphics subsystem. -- Please make sure that the hardware-summary log file, and any other installation logs that you think would be useful are attached to this report. Please compress large files using gzip. Once you have filled out this report, mail it to sub...@bugs.debian.org. == Installer lsb-release: == DISTRIB_ID=Debian DISTRIB_DESCRIPTION="Debian GNU/Linux installer" DISTRIB_RELEASE="9 (stretch) - installer build 20161101-00:08" X_INSTALLATION_MEDIUM=netboot == Installer hardware-summary: == uname -a: Linux cenedra 4.7.0-1-amd64 #1 SMP Debian 4.7.8-1 (2016-10-19) x86_64 GNU/Linux lspci -knn: 00:00.0 Host bridge [0600]: Intel Corporation 5520 I/O Hub to ESI Port [8086:3406] (rev 13) lspci -knn: Subsystem: Dell Device [1028:026d] lspci -knn: 00:01.0 PCI bridge [0604]: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 [8086:3408] (rev 13) lspci -knn: Kernel driver in use: pcieport lspci -knn: 00:03.0 PCI bridge [0604]: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 [8086:340a] (rev 13) lspci -knn: Kernel driver in use: pcieport lspci -knn: 00:07.0 PCI bridge [0604]: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 [8086:340e] (rev 13) lspci -knn: Kernel driver in use: pcieport lspci -knn: 00:14.0 PIC [0800]: Intel Corporation 7500/5520/5500/X58 I/O Hub System Management Registers [8086:342e] (rev 13) lspci -knn: Subsystem: Device [0028:006d] lspci -knn: 00:14.1 PIC [0800]: Intel Corporation 7500/5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers [8086:3422] (rev 13) lspci -knn: Subsystem: Device [0028:006d] lspci -knn: 00:14.2 PIC [0800]: Intel Corporation 7500/5520/5500/X58 I/O Hub Control Status and RAS Registers [8086:3423] (rev 13) lspci -knn: Subsystem: Device [0028:006d] lspci -knn: 00:1a.0 USB controller [0c03]: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4 [8086:3a37] lspci -knn: Subsystem: Dell Device [1028:026d] lspci -knn: Kernel driver in use: uhci_hcd lspci -knn: Kernel modules: uhci_hcd lspci -knn: 00:1a.1 USB controller [0c03]: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5 [8086:3a38] lspci -knn: Subsystem: Dell Device [1028:026d] lspci -knn: Kernel driver in use: uhci_hcd lspci -knn: Kernel modules: uhci_hcd lspci -knn: 00:1a.2 USB controller [0c03]: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6 [8086:3a39] lspci -knn: Subsystem: Dell Device [1028:026d] lspci -knn: Kernel driver in use: uhci_hcd lspci -knn: Kernel modules: uhci_hcd lspci -knn: 00:1a.7 USB controller [0c03]: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2 [8086:3a3c] lspci -knn: Subsystem: Dell Device [1028:026d] lspci -knn: Kernel driver in use: ehci-pci lspci -knn: Kernel modules: ehci_pci lspci -knn: 00:1b.0 Audio device [0403]: Intel Corporation
Bug#678519: routig wedged after 1 month
Dears, I think i have solved it. Will know for certain in a bit more then a month though. Apparently aiccu goes off the track when the system time is more then a certain amount of seconds off. At least with the clock off more then 145 seconds it will refuse to start. When the situation occurs while it is running, it apparently kills the IPv6 routing. As my network is dual-stack and a lot of workstations are indeed dual stack, this causes significant delay on any browsing, as it needs for IPv6 to time out. I've now added an ntp server to the setup. What bugs me is that a reboot would solve it. So apparently the hardware clock remains OK, and only the system time goes off. Now if this behaviour were somewhere documented... Once you know what the problem is, you can find it. cheers, Rudy -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#678519: after about a month, routing gets wedged
On 08-07-12 03:02, Jonathan Nieder wrote: Rudy Zijlstra wrote: Still using it. Its my firewall. Sorry for missing the questions. Yes, sorry about the clutter in my message. [...] 1/ always behaved this way Not certain. [...] 2/ how many times? at least twice. Its well possible that earlier cases were masked by reboots from other reasons. And it has taken me some time before i linked particular slow network behaviour to a firewall problem 3/ how stable is the 1 month gestation time? no certainty on this one. OK, problem is back. About 3 days more then 1 month. IPv4 browsing is very slow, IPv6 routing is down. I can no longer ping6 ipv6.google.com. It gets the record, but no responses. Not even when doing it on the firewall itself. output on June 25: == IPv6 == ip -f inet6 neigh show 2001:610:73e:0:f53f:5d0e:28cc:3479 dev eth2 lladdr f0:4d:a2:fa:5c:67 REACHABLE 2001:610:73e:0:225:64ff:fea4:928e dev eth2 lladdr 00:25:64:a4:92:8e REACHABLE 2001:610:73e:0:d6be:d9ff:fe12:73f0 dev eth2 lladdr d4:be:d9:12:73:f0 REACHABLE 2001:610:73e:0:206:5bff:fef7:45e5 dev eth2 lladdr 00:06:5b:f7:45:e5 REACHABLE fe80::208:2ff:fea3:d56b dev eth2 lladdr 00:08:02:a3:d5:6b router STALE ip -f inet6 route list cache 2001:610:73e:0:206:5bff:fef7:45e5 via 2001:610:73e:0:206:5bff:fef7:45e5 dev eth2 metric 0 cache mtu 1500 advmss 1440 hoplimit 0 2001:610:73e:0:225:64ff:fea4:928e via 2001:610:73e:0:225:64ff:fea4:928e dev eth2 metric 0 cache mtu 1500 advmss 1440 hoplimit 0 2001:610:73e:0:d6be:d9ff:fe12:73f0 via 2001:610:73e:0:d6be:d9ff:fe12:73f0 dev eth2 metric 0 cache mtu 1500 advmss 1440 hoplimit 0 2001:610:73e:0:f53f:5d0e:28cc:3479 via 2001:610:73e:0:f53f:5d0e:28cc:3479 dev eth2 metric 0 cache mtu 1500 advmss 1440 hoplimit 0 Current output: # ip -f inet6 neigh show 2001:610:73e:0:21b:21ff:fe22:b647 dev eth2 lladdr 00:1b:21:22:b6:47 STALE 2001:610:73e::15 dev eth2 lladdr 00:25:64:a4:92:8e REACHABLE fe80::208:2ff:fea3:d56b dev eth2 lladdr 00:08:02:a3:d5:6b router STALE fe80::21b:21ff:fe22:b647 dev eth2 lladdr 00:1b:21:22:b6:47 DELAY # ip -f inet6 route list cache 2001:610:73e::15 via 2001:610:73e::15 dev eth2 metric 0 cache mtu 1500 advmss 1440 hoplimit 0 2001:610:73e:0:21b:21ff:fe22:b647 via 2001:610:73e:0:21b:21ff:fe22:b647 dev eth2 metric 0 cache mtu 1500 advmss 1440 hoplimit 0 ip -f inet6 route flush makes no difference. Neither flushing the neighbor cache ifdown / ifup of the external ethernet port makes no difference rmmod tg3 removed both interfaces (so the driver does indeed handle those ports) followed by modprobe tg3 made no difference either. package firmware-linux-nonfree is current. stopping aiccu, rmmod sit and tunnel4 and then reloading and restarting aiccu did solve it Next time i will start with restarting aiccu, and not rmmoding the related modules Cheers, Rudy -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#678519: after about a month, routing gets wedged
On 07-07-12 21:57, Jonathan Nieder wrote: Jonathan Nieder wrote: Rudy, is this a regression, or has this system always behaved this way? How many times has it happened? How reliable is the 1 month gestation time? When did it start? Ping. Do you still have access to this machine? Still using it. Its my firewall. Sorry for missing the questions. With the combination of top/bottom posting i had missed the bottom part. Your questions: 1/ always behaved this way Not certain. I installed it, then there were a number of changes that also had impact on the firewall, which caused some reboots (like changing to new version of squid3 iso squid2). Strictly not needed to reboot, but after major changes i always test whether the sytem comes back correctly from reboot. I also had some squeeze kernel updates, which do need a reboot. 2/ how many times? at least twice. Its well possible that earlier cases were masked by reboots from other reasons. And it has taken me some time before i linked particular slow network behaviour to a firewall problem 3/ how stable is the 1 month gestation time? no certainty on this one. After the last time i had confirmation this was a firewall problem, and had confirmation for 2x. thinking back the timespan between the 2 certain occasions is 3 - 4 weeks. But i did not keep a record. 4/ When did it start Do not know. See above cheers, Rudy -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#679719: mdadm monthly cron job should NOT check all arays at the same time
Package: mdadm Version: 3.1.4-1+8efb9d1+squeeze1 Severity: normal I have deleted -- after finding through pain it existed -- the monthly mdadm cron job in cron.d. the problem is that this cron job will cause ALL raid's to be checked at the same time, causing a potentially very high I/O load. In my particular case with 2 RAID 6 and 2 RAID 5 it causes an I/O load that will overload the link between the CPU and the expansion cabinet. This WILL result in a number of disks being kicked from arrays, as they cannot answer in time. As it is not predictable which disks will be kicked, this can cause data loss resulting in the need to recover from backup. I have written a script in cron.daily what will cause each raid to be checked weekly, and never more then 1 at the same time. I am aware that in a profesional environment this problem should not occur. For home/small business users though a setup like mine is well usable. BUT care must be taken that the I/O rate remains within limites. And checking all raid at the same time can cause very significant The currently running recovery on the 2 raid 6 is causing an io load of about 1500 - 1600 tps. I have not checked what the limit is. Doing a check on all 4 raid is clearly over the limit though. -- Package-specific info: --- mdadm.conf DEVICE partitions CREATE owner=root group=disk mode=0660 auto=yes HOMEHOST system MAILADDR r...@romunt.nl ARRAY /dev/md/0 metadata=1.2 UUID=0482022f:ea8b960d:d4f06e8c:cd86f783 name=mythfiler2:0 ARRAY /dev/md/1 metadata=1.2 UUID=08e1195f:ccecb5ca:f4ad5439:f950bc53 name=mythfiler2:1 ARRAY /dev/md/2 metadata=1.2 UUID=1f5e84d1:bb0477a4:96a852a0:29ec7e81 name=mythfiler2:2 ARRAY /dev/md4 metadata=1.2 name=mythfiler:4 UUID=045c7d02:f0d86b64:74bb6a9a:45589b23 --- /etc/default/mdadm INITRDSTART='none' AUTOSTART=true AUTOCHECK=true START_DAEMON=true DAEMON_OPTIONS=--syslog VERBOSE=false --- /proc/mdstat: Personalities : [raid6] [raid5] [raid4] md4 : active raid5 sda[0] sdf[4] sde[2] sdb[1] 4395405312 blocks super 1.2 level 5, 4096k chunk, algorithm 2 [4/4] [] md2 : active raid5 sdi[0] sdn[4] sdm[2] sdj[1] 4395405312 blocks super 1.2 level 5, 4096k chunk, algorithm 2 [4/4] [] md1 : active raid6 sdv[7](S) sdu[6] sdc[0] sdq[5] sds[4] sdk[1](F) sdo[2] sdg[3](F) 7814037504 blocks super 1.2 level 6, 4096k chunk, algorithm 2 [6/4] [U_U_UU] [] recovery = 0.3% (6986240/1953509376) finish=5639.4min speed=5752K/sec md0 : active raid6 sdw[7] sdd[6](F) sdt[5] sdr[4] sdp[3] sdl[2] sdh[1] 7814037504 blocks super 1.2 level 6, 4096k chunk, algorithm 2 [6/5] [_U] [] recovery = 0.4% (8924416/1953509376) finish=3129.2min speed=10356K/sec unused devices: none --- /proc/partitions: major minor #blocks name 1040 143367120 cciss/c0d0 1041 487424 cciss/c0d0p1 10425859328 cciss/c0d0p2 10435859328 cciss/c0d0p3 1044 1 cciss/c0d0p4 10455858304 cciss/c0d0p5 10463905536 cciss/c0d0p6 1047 14647296 cciss/c0d0p7 1048 106743808 cciss/c0d0p8 80 1465138584 sda 8 16 1465138584 sdb 8 32 1953514584 sdc 8 64 1465138584 sde 8 80 1465138584 sdf 8 112 1953514584 sdh 8 128 1465138584 sdi 8 144 1465138584 sdj 8 176 1953514584 sdl 8 192 1465138584 sdm 8 208 1465138584 sdn 8 224 1953514584 sdo 8 240 1953514584 sdp 650 1953514584 sdq 65 16 1953514584 sdr 65 32 1953514584 sds 65 48 1953514584 sdt 90 7814037504 md0 91 7814037504 md1 92 4395405312 md2 94 4395405312 md4 65 64 1953514584 sdu 65 80 1953514584 sdv 65 96 1953514584 sdw --- LVM physical volumes: LVM does not seem to be used. --- mount output /dev/cciss/c0d0p2 on / type xfs (rw) tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755) proc on /proc type proc (rw,noexec,nosuid,nodev) sysfs on /sys type sysfs (rw,noexec,nosuid,nodev) udev on /dev type tmpfs (rw,mode=0755) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620) /dev/cciss/c0d0p1 on /boot type xfs (rw) /dev/cciss/c0d0p7 on /data/sql type xfs (rw) /dev/cciss/c0d0p8 on /home type xfs (rw) /dev/cciss/c0d0p6 on /tmp type xfs (rw) /dev/cciss/c0d0p5 on /var type xfs (rw) /dev/md0 on /data/mythstorage0 type xfs (rw) /dev/md1 on /data/mythstorage1 type xfs (rw) /dev/md2 on /data/huishouden type xfs (rw) /dev/md4 on /data/mythstorage2 type xfs (rw) nfsd on /proc/fs/nfsd type nfsd (rw) rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) --- initrd.img-2.6.38.6: 14563 blocks 2c62ad86f2b72f3d0118fa01e9dc9c8f ./etc/mdadm/mdadm.conf cb1cf979d6024e34c525db1cb6069ddd ./lib/modules/2.6.38.6/kernel/drivers/md/linear.ko
Bug#551555: mountnfs.sh: start should declare dependency on name resolver
Anything happening on this bug? I note that wheezy still has it, and i have to manually mount my nfs after a reboot. cheers, Rudy -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#678519: general: after about 1 month of uptime, routing of IPv6 packets is no longer possible, and IPv4 routing becomes slow and unpredictable. Rebooting brings all functionality back, and back to
On 22-06-12 21:38, Henrique de Moraes Holschuh wrote: On Fri, 22 Jun 2012, Rudy Zijlstra wrote: let system run with IPv4 IPv6 routing for about 1 month IPv6 routing will start to fail IPv4 routing becomes slow and unpredictable no obvious causes visible in the system. top and friends do not show a cpu hog a reboot will bring the system back to normal behaviour. Please use (as root) ip neigh show, and ip route list cache to try to track down any weird differences between the box when it is behaving normally, and the box when wedged. You may want to compare it to a healthy box on the same network segment. You can also try to see if ip route flush cache and ip neigh flush can unwedge the system. After a flush, ip neigh show and ip route list cache should return very few, if any, entries. Thanks, i've stored the current output of these commands, including the IPv6 version, so i can compare when trouble hits again in some weeks. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#678519: general: after about 1 month of uptime, routing of IPv6 packets is no longer possible, and IPv4 routing becomes slow and unpredictable. Rebooting brings all functionality back, and back to
On 23-06-12 14:53, Henrique de Moraes Holschuh wrote: On Sat, 23 Jun 2012, Rudy Zijlstra wrote: On 22-06-12 21:38, Henrique de Moraes Holschuh wrote: On Fri, 22 Jun 2012, Rudy Zijlstra wrote: let system run with IPv4 IPv6 routing for about 1 month IPv6 routing will start to fail IPv4 routing becomes slow and unpredictable no obvious causes visible in the system. top and friends do not show a cpu hog a reboot will bring the system back to normal behaviour. Please use (as root) ip neigh show, and ip route list cache to try to track down any weird differences between the box when it is behaving normally, and the box when wedged. You may want to compare it to a healthy box on the same network segment. You can also try to see if ip route flush cache and ip neigh flush can unwedge the system. After a flush, ip neigh show and ip route list cache should return very few, if any, entries. Thanks, i've stored the current output of these commands, including the IPv6 version, so i can compare when trouble hits again in some weeks. You probably want to store their output once a day. If it is a neighbour/route cache leak or malfunction of some sort (e.g. routes getting stuck in the presence of ICMP redirects), you should be able to notice that old crap is accumulating over time. If possible, do the same in a box that does not show the same problem (ideally in the same network segment), so that you have a baseline to compare to. Note that it could be something else entirely, don't rule out hardware malfunction (sometimes cleared if you down the interfaces and then bring them up again), or driver issues (sometimes cleared if you rmmod + modprobe the buggy driver). And make sure the box is running the latest firmware (BIOS/UEFI, NIC firmware...). i'll script the commands from cron.daily. To compare with similar box is kind of difficult. I run only a single firewall And although i have several squeeze boxes active, this is the only one showing this problem NIC firmware is on the latest on condition that Squeeze has the latest. I do expect that though, as is it pretty old HW. Fully capable of firewall though. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#678519: general: after about 1 month of uptime, routing of IPv6 packets is no longer possible, and IPv4 routing becomes slow and unpredictable. Rebooting brings all functionality back, and back to
Package: general Severity: important Tags: ipv6 let system run with IPv4 IPv6 routing for about 1 month IPv6 routing will start to fail IPv4 routing becomes slow and unpredictable no obvious causes visible in the system. top and friends do not show a cpu hog a reboot will bring the system back to normal behaviour. -- System Information: Debian Release: 6.0.5 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: i386 (i686) Kernel: Linux 2.6.32-5-686 (SMP w/2 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#678519: general: after about 1 month of uptime, routing of IPv6 packets is no longer possible, and IPv4 routing becomes slow and unpredictable. Rebooting brings all functionality back, and back to
On 22-06-12 15:04, Roberto C. Sánchez wrote: On Fri, Jun 22, 2012 at 01:59:37PM +0200, Rudy Zijlstra wrote: Package: general Severity: important Tags: ipv6 let system run with IPv4 IPv6 routing for about 1 month IPv6 routing will start to fail IPv4 routing becomes slow and unpredictable no obvious causes visible in the system. top and friends do not show a cpu hog a reboot will bring the system back to normal behaviour. Could this be something to do with connection tracking? Regards, -Roberto Both IPv4 and IPv6 are impacted, which have separate iptables. IPv6 routing gets fully blocked, IPv4 goes slow and unpredictable. How could i check any relation to connection tracking? cheers, Rudy -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org