Bug#851432: installation-reports: reboot or halt hangs after unmounting all disks on active update task

2017-01-14 Thread Rudy Zijlstra
Package: installation-reports
Severity: normal

Dear Maintainer,

*** Reporter, please consider answering these questions, where appropriate ***

   * What led up to the situation?
   ==> Any reboot or halt triggers it
   ==> at shutdown system goes through all needed actions, unmounts the
   remote and local disks, then waits for an update action to finish
   (which can never finish as all disks have been unmounted). The stated
   wait time is 15 min some seconds. (but did not wait for that).  
   * What exactly did you do (or not do) that was effective (or
 ineffective)?
   ==> "solved" by doing a HW switch off (4 seconds power button)
   * What was the outcome of this action?
   ==> system power-off
   * What outcome did you expect instead?
   ==> reboot should reboot, and not result in long wait time. 

*** End of the template - remove these template lines ***


-- Package-specific info:

Boot method: network for install, after that local disk (SDD)
Image version: netboot builddate 11 nov 2016
Date: Installed nov 2016. with all updates since. 

Machine: Dell precision T7500
Partitions: 


Base System Installation Checklist:
[O] = OK, [E] = Error (please elaborate below), [ ] = didn't try it

Initial boot:   [O]
Detect network card:[O]
Configure network:  [O]
Detect CD:  [O]
Load installer modules: [O]
Clock/timezone setup:   [O]
User/password setup:[O]
Detect hard drives: [O]
Partition hard drives:  [O]
Install base system:[O]
Install tasks:  [O]
Install boot loader:[O]
Overall install:[O]

Comments/Problems: install seems OK, shutdown is KO



System is normally either active or in RAM-sleep. Did not do shutdown,
as system typically crashed before. When starting to debug that, and
looking for other ongoing issues, the problem on shutdown was found. 

The ongoing issue is not related, and seems an initialisation issue of
the graphics subsystem. 

-- 

Please make sure that the hardware-summary log file, and any other
installation logs that you think would be useful are attached to this
report. Please compress large files using gzip.

Once you have filled out this report, mail it to sub...@bugs.debian.org.

==
Installer lsb-release:
==
DISTRIB_ID=Debian
DISTRIB_DESCRIPTION="Debian GNU/Linux installer"
DISTRIB_RELEASE="9 (stretch) - installer build 20161101-00:08"
X_INSTALLATION_MEDIUM=netboot

==
Installer hardware-summary:
==
uname -a: Linux cenedra 4.7.0-1-amd64 #1 SMP Debian 4.7.8-1 (2016-10-19) x86_64 
GNU/Linux
lspci -knn: 00:00.0 Host bridge [0600]: Intel Corporation 5520 I/O Hub to ESI 
Port [8086:3406] (rev 13)
lspci -knn: Subsystem: Dell Device [1028:026d]
lspci -knn: 00:01.0 PCI bridge [0604]: Intel Corporation 5520/5500/X58 I/O Hub 
PCI Express Root Port 1 [8086:3408] (rev 13)
lspci -knn: Kernel driver in use: pcieport
lspci -knn: 00:03.0 PCI bridge [0604]: Intel Corporation 5520/5500/X58 I/O Hub 
PCI Express Root Port 3 [8086:340a] (rev 13)
lspci -knn: Kernel driver in use: pcieport
lspci -knn: 00:07.0 PCI bridge [0604]: Intel Corporation 5520/5500/X58 I/O Hub 
PCI Express Root Port 7 [8086:340e] (rev 13)
lspci -knn: Kernel driver in use: pcieport
lspci -knn: 00:14.0 PIC [0800]: Intel Corporation 7500/5520/5500/X58 I/O Hub 
System Management Registers [8086:342e] (rev 13)
lspci -knn: Subsystem: Device [0028:006d]
lspci -knn: 00:14.1 PIC [0800]: Intel Corporation 7500/5520/5500/X58 I/O Hub 
GPIO and Scratch Pad Registers [8086:3422] (rev 13)
lspci -knn: Subsystem: Device [0028:006d]
lspci -knn: 00:14.2 PIC [0800]: Intel Corporation 7500/5520/5500/X58 I/O Hub 
Control Status and RAS Registers [8086:3423] (rev 13)
lspci -knn: Subsystem: Device [0028:006d]
lspci -knn: 00:1a.0 USB controller [0c03]: Intel Corporation 82801JI (ICH10 
Family) USB UHCI Controller #4 [8086:3a37]
lspci -knn: Subsystem: Dell Device [1028:026d]
lspci -knn: Kernel driver in use: uhci_hcd
lspci -knn: Kernel modules: uhci_hcd
lspci -knn: 00:1a.1 USB controller [0c03]: Intel Corporation 82801JI (ICH10 
Family) USB UHCI Controller #5 [8086:3a38]
lspci -knn: Subsystem: Dell Device [1028:026d]
lspci -knn: Kernel driver in use: uhci_hcd
lspci -knn: Kernel modules: uhci_hcd
lspci -knn: 00:1a.2 USB controller [0c03]: Intel Corporation 82801JI (ICH10 
Family) USB UHCI Controller #6 [8086:3a39]
lspci -knn: Subsystem: Dell Device [1028:026d]
lspci -knn: Kernel driver in use: uhci_hcd
lspci -knn: Kernel modules: uhci_hcd
lspci -knn: 00:1a.7 USB controller [0c03]: Intel Corporation 82801JI (ICH10 
Family) USB2 EHCI Controller #2 [8086:3a3c]
lspci -knn: Subsystem: Dell Device [1028:026d]
lspci -knn: Kernel driver in use: ehci-pci
lspci -knn: Kernel modules: ehci_pci
lspci -knn: 00:1b.0 Audio device [0403]: Intel Corporation 

Bug#678519: routig wedged after 1 month

2012-08-18 Thread Rudy Zijlstra

Dears,

I think i have solved it. Will know for certain in a bit more then a 
month though.


Apparently aiccu goes off the track when the system time is more then a 
certain amount of seconds off. At least with the clock off more then 145 
seconds it will refuse to start.
When the situation occurs while it is running, it apparently kills the 
IPv6 routing. As my network is dual-stack and a lot of workstations are 
indeed dual stack, this causes significant delay on any browsing, as it 
needs for IPv6 to time out.


I've now added an ntp server to the setup.

What bugs me is that a reboot would solve it. So apparently the hardware 
clock remains OK, and only the system time goes off.


Now if this behaviour were somewhere documented... Once you know what 
the problem is, you can find it.



cheers,


Rudy


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#678519: after about a month, routing gets wedged

2012-07-28 Thread Rudy Zijlstra

On 08-07-12 03:02, Jonathan Nieder wrote:

Rudy Zijlstra wrote:


Still using it. Its my firewall. Sorry for missing the questions.

Yes, sorry about the clutter in my message.

[...]

1/ always behaved this way
Not certain.

[...]

2/ how many times?
at least twice. Its well possible that earlier cases were  masked by
reboots from other reasons. And it has taken me some time before i
linked particular slow network behaviour to a firewall problem

3/ how stable is the 1 month gestation time?
no certainty on this one.

OK, problem is back. About 3 days more then 1 month.

IPv4 browsing is very slow, IPv6 routing is down. I can no longer ping6 
ipv6.google.com. It gets the  record, but no responses. Not even 
when doing it on the firewall itself.


output on June 25:
== IPv6 ==
ip -f inet6 neigh show
2001:610:73e:0:f53f:5d0e:28cc:3479 dev eth2 lladdr f0:4d:a2:fa:5c:67 
REACHABLE
2001:610:73e:0:225:64ff:fea4:928e dev eth2 lladdr 00:25:64:a4:92:8e 
REACHABLE
2001:610:73e:0:d6be:d9ff:fe12:73f0 dev eth2 lladdr d4:be:d9:12:73:f0 
REACHABLE
2001:610:73e:0:206:5bff:fef7:45e5 dev eth2 lladdr 00:06:5b:f7:45:e5 
REACHABLE

fe80::208:2ff:fea3:d56b dev eth2 lladdr 00:08:02:a3:d5:6b router STALE
ip -f inet6 route list cache
2001:610:73e:0:206:5bff:fef7:45e5 via 2001:610:73e:0:206:5bff:fef7:45e5 
dev eth2  metric 0

cache  mtu 1500 advmss 1440 hoplimit 0
2001:610:73e:0:225:64ff:fea4:928e via 2001:610:73e:0:225:64ff:fea4:928e 
dev eth2  metric 0

cache  mtu 1500 advmss 1440 hoplimit 0
2001:610:73e:0:d6be:d9ff:fe12:73f0 via 
2001:610:73e:0:d6be:d9ff:fe12:73f0 dev eth2  metric 0

cache  mtu 1500 advmss 1440 hoplimit 0
2001:610:73e:0:f53f:5d0e:28cc:3479 via 
2001:610:73e:0:f53f:5d0e:28cc:3479 dev eth2  metric 0

cache  mtu 1500 advmss 1440 hoplimit 0


Current output:
# ip -f inet6 neigh show
2001:610:73e:0:21b:21ff:fe22:b647 dev eth2 lladdr 00:1b:21:22:b6:47 STALE
2001:610:73e::15 dev eth2 lladdr 00:25:64:a4:92:8e REACHABLE
fe80::208:2ff:fea3:d56b dev eth2 lladdr 00:08:02:a3:d5:6b router STALE
fe80::21b:21ff:fe22:b647 dev eth2 lladdr 00:1b:21:22:b6:47 DELAY
# ip -f inet6 route list cache
2001:610:73e::15 via 2001:610:73e::15 dev eth2  metric 0
cache  mtu 1500 advmss 1440 hoplimit 0
2001:610:73e:0:21b:21ff:fe22:b647 via 2001:610:73e:0:21b:21ff:fe22:b647 
dev eth2  metric 0

cache  mtu 1500 advmss 1440 hoplimit 0


ip -f inet6 route flush makes no difference. Neither flushing the 
neighbor cache


ifdown / ifup of the external ethernet port makes no difference

rmmod tg3 removed both interfaces (so the driver does indeed handle 
those ports) followed by modprobe tg3 made no difference either.


package firmware-linux-nonfree is current.

stopping aiccu, rmmod sit and tunnel4 and then reloading and restarting 
aiccu did solve it


Next time i will start with restarting aiccu, and not rmmoding the 
related modules


Cheers,


Rudy


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#678519: after about a month, routing gets wedged

2012-07-07 Thread Rudy Zijlstra

On 07-07-12 21:57, Jonathan Nieder wrote:

Jonathan Nieder wrote:


Rudy, is this a regression, or has this system always behaved this
way?  How many times has it happened?  How reliable is the 1 month
gestation time?  When did it start?

Ping.  Do you still have access to this machine?
Still using it. Its my firewall. Sorry for missing the questions. With 
the combination of top/bottom posting i had missed the bottom part.


Your questions:
1/ always behaved this way
Not certain. I installed it, then there were a number of changes that 
also had impact on the firewall, which caused some reboots (like 
changing to new version of squid3 iso squid2). Strictly not needed to 
reboot, but after major changes i always test whether the sytem comes 
back correctly from reboot. I also had some squeeze kernel updates, 
which do need a reboot.


2/ how many times?
at least twice. Its well possible that earlier cases were  masked by 
reboots from other reasons. And it has taken me some time before i 
linked particular slow network behaviour to a firewall problem


3/ how stable is the 1 month gestation time?
no certainty on this one. After the last time i had confirmation this 
was a firewall problem, and had confirmation for 2x. thinking back the 
timespan between the 2 certain occasions is 3 - 4 weeks. But i did not 
keep a record.


4/ When did it start
Do not know. See above

cheers,


Rudy



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#679719: mdadm monthly cron job should NOT check all arays at the same time

2012-06-30 Thread Rudy Zijlstra
Package: mdadm
Version: 3.1.4-1+8efb9d1+squeeze1
Severity: normal


I  have deleted -- after finding through pain it existed -- the monthly mdadm 
cron job in cron.d. 

the problem is that this cron job will cause ALL raid's to be checked at the 
same time, causing a potentially very high I/O load. In my particular case with 
2 RAID 6 and 2 RAID 5 it causes an I/O load that will overload the link between 
the CPU and the expansion cabinet. This WILL result in a number of disks being 
kicked from arrays, as they cannot answer in time. As it is not predictable 
which disks will be kicked, this can cause data loss resulting in the need to 
recover from backup.

I have written a script in cron.daily what will cause each raid to be checked 
weekly, and never more then 1 at the same time. 

I am aware that in a profesional environment this problem should not occur. For 
home/small business users though a setup like mine is well usable. BUT care 
must be taken that the I/O rate remains within limites. And checking all raid 
at the same time can cause very significant 

The currently running recovery on the 2 raid 6 is causing an io load of about 
1500 - 1600 tps. 

I have not checked what the limit is. Doing a check on all 4 raid is clearly 
over the limit though. 

-- Package-specific info:
--- mdadm.conf
DEVICE partitions
CREATE owner=root group=disk mode=0660 auto=yes
HOMEHOST system
MAILADDR r...@romunt.nl
ARRAY /dev/md/0 metadata=1.2 UUID=0482022f:ea8b960d:d4f06e8c:cd86f783 
name=mythfiler2:0
ARRAY /dev/md/1 metadata=1.2 UUID=08e1195f:ccecb5ca:f4ad5439:f950bc53 
name=mythfiler2:1
ARRAY /dev/md/2 metadata=1.2 UUID=1f5e84d1:bb0477a4:96a852a0:29ec7e81 
name=mythfiler2:2
ARRAY /dev/md4 metadata=1.2 name=mythfiler:4 
UUID=045c7d02:f0d86b64:74bb6a9a:45589b23

--- /etc/default/mdadm
INITRDSTART='none'
AUTOSTART=true
AUTOCHECK=true
START_DAEMON=true
DAEMON_OPTIONS=--syslog
VERBOSE=false

--- /proc/mdstat:
Personalities : [raid6] [raid5] [raid4] 
md4 : active raid5 sda[0] sdf[4] sde[2] sdb[1]
  4395405312 blocks super 1.2 level 5, 4096k chunk, algorithm 2 [4/4] []
  
md2 : active raid5 sdi[0] sdn[4] sdm[2] sdj[1]
  4395405312 blocks super 1.2 level 5, 4096k chunk, algorithm 2 [4/4] []
  
md1 : active raid6 sdv[7](S) sdu[6] sdc[0] sdq[5] sds[4] sdk[1](F) sdo[2] 
sdg[3](F)
  7814037504 blocks super 1.2 level 6, 4096k chunk, algorithm 2 [6/4] 
[U_U_UU]
  []  recovery =  0.3% (6986240/1953509376) 
finish=5639.4min speed=5752K/sec
  
md0 : active raid6 sdw[7] sdd[6](F) sdt[5] sdr[4] sdp[3] sdl[2] sdh[1]
  7814037504 blocks super 1.2 level 6, 4096k chunk, algorithm 2 [6/5] 
[_U]
  []  recovery =  0.4% (8924416/1953509376) 
finish=3129.2min speed=10356K/sec
  
unused devices: none

--- /proc/partitions:
major minor  #blocks  name

 1040  143367120 cciss/c0d0
 1041 487424 cciss/c0d0p1
 10425859328 cciss/c0d0p2
 10435859328 cciss/c0d0p3
 1044  1 cciss/c0d0p4
 10455858304 cciss/c0d0p5
 10463905536 cciss/c0d0p6
 1047   14647296 cciss/c0d0p7
 1048  106743808 cciss/c0d0p8
   80 1465138584 sda
   8   16 1465138584 sdb
   8   32 1953514584 sdc
   8   64 1465138584 sde
   8   80 1465138584 sdf
   8  112 1953514584 sdh
   8  128 1465138584 sdi
   8  144 1465138584 sdj
   8  176 1953514584 sdl
   8  192 1465138584 sdm
   8  208 1465138584 sdn
   8  224 1953514584 sdo
   8  240 1953514584 sdp
  650 1953514584 sdq
  65   16 1953514584 sdr
  65   32 1953514584 sds
  65   48 1953514584 sdt
   90 7814037504 md0
   91 7814037504 md1
   92 4395405312 md2
   94 4395405312 md4
  65   64 1953514584 sdu
  65   80 1953514584 sdv
  65   96 1953514584 sdw

--- LVM physical volumes:
LVM does not seem to be used.
--- mount output
/dev/cciss/c0d0p2 on / type xfs (rw)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
/dev/cciss/c0d0p1 on /boot type xfs (rw)
/dev/cciss/c0d0p7 on /data/sql type xfs (rw)
/dev/cciss/c0d0p8 on /home type xfs (rw)
/dev/cciss/c0d0p6 on /tmp type xfs (rw)
/dev/cciss/c0d0p5 on /var type xfs (rw)
/dev/md0 on /data/mythstorage0 type xfs (rw)
/dev/md1 on /data/mythstorage1 type xfs (rw)
/dev/md2 on /data/huishouden type xfs (rw)
/dev/md4 on /data/mythstorage2 type xfs (rw)
nfsd on /proc/fs/nfsd type nfsd (rw)
rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

--- initrd.img-2.6.38.6:
14563 blocks
2c62ad86f2b72f3d0118fa01e9dc9c8f  ./etc/mdadm/mdadm.conf
cb1cf979d6024e34c525db1cb6069ddd  
./lib/modules/2.6.38.6/kernel/drivers/md/linear.ko

Bug#551555: mountnfs.sh: start should declare dependency on name resolver

2012-06-30 Thread Rudy Zijlstra

Anything happening on this bug?

I note that wheezy still has it, and i have to manually mount my nfs 
after a reboot.


cheers,


Rudy



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#678519: general: after about 1 month of uptime, routing of IPv6 packets is no longer possible, and IPv4 routing becomes slow and unpredictable. Rebooting brings all functionality back, and back to

2012-06-23 Thread Rudy Zijlstra

On 22-06-12 21:38, Henrique de Moraes Holschuh wrote:

On Fri, 22 Jun 2012, Rudy Zijlstra wrote:

let system run with IPv4  IPv6 routing for about 1 month

IPv6 routing will start to fail
IPv4 routing becomes slow and unpredictable

no obvious causes visible in the system. top and friends do not show a cpu hog

a reboot will bring the system back to normal behaviour.

Please use (as root) ip neigh show, and ip route list cache to try to
track down any weird differences between the box when it is behaving
normally, and the box when wedged.  You may want to compare it to a healthy
box on the same network segment.

You can also try to see if ip route flush cache and ip neigh flush can
unwedge the system.  After a flush, ip neigh show and ip route list
cache should return very few, if any, entries.

Thanks, i've stored the current output of these commands, including the 
IPv6 version, so i can compare when trouble hits again in some weeks.




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#678519: general: after about 1 month of uptime, routing of IPv6 packets is no longer possible, and IPv4 routing becomes slow and unpredictable. Rebooting brings all functionality back, and back to

2012-06-23 Thread Rudy Zijlstra

On 23-06-12 14:53, Henrique de Moraes Holschuh wrote:

On Sat, 23 Jun 2012, Rudy Zijlstra wrote:

On 22-06-12 21:38, Henrique de Moraes Holschuh wrote:

On Fri, 22 Jun 2012, Rudy Zijlstra wrote:

let system run with IPv4   IPv6 routing for about 1 month

IPv6 routing will start to fail
IPv4 routing becomes slow and unpredictable

no obvious causes visible in the system. top and friends do not show a cpu hog

a reboot will bring the system back to normal behaviour.

Please use (as root) ip neigh show, and ip route list cache to try to
track down any weird differences between the box when it is behaving
normally, and the box when wedged.  You may want to compare it to a healthy
box on the same network segment.

You can also try to see if ip route flush cache and ip neigh flush can
unwedge the system.  After a flush, ip neigh show and ip route list
cache should return very few, if any, entries.


Thanks, i've stored the current output of these commands, including
the IPv6 version, so i can compare when trouble hits again in some
weeks.

You probably want to store their output once a day.  If it is a
neighbour/route cache leak or malfunction of some sort (e.g. routes getting
stuck in the presence of ICMP redirects), you should be able to notice that
old crap is accumulating over time.

If possible, do the same in a box that does not show the same problem
(ideally in the same network segment), so that you have a baseline to
compare to.

Note that it could be something else entirely, don't rule out hardware
malfunction (sometimes cleared if you down the interfaces and then bring
them up again), or driver issues (sometimes cleared if you rmmod + modprobe
the buggy driver).  And make sure the box is running the latest firmware
(BIOS/UEFI, NIC firmware...).

i'll script the commands from cron.daily. To compare with similar box is 
kind of difficult. I run only a single firewall
And although i have several squeeze boxes active, this is the only one 
showing this problem


NIC firmware is on the latest on condition that Squeeze has the latest. 
I do expect that though, as is it pretty old HW. Fully capable of 
firewall though.




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#678519: general: after about 1 month of uptime, routing of IPv6 packets is no longer possible, and IPv4 routing becomes slow and unpredictable. Rebooting brings all functionality back, and back to

2012-06-22 Thread Rudy Zijlstra
Package: general
Severity: important
Tags: ipv6

let system run with IPv4  IPv6 routing for about 1 month
 IPv6 routing will start to fail
 IPv4 routing becomes slow and unpredictable

no obvious causes visible in the system. top and friends do not show a cpu hog

a reboot will bring the system back to normal behaviour. 

-- System Information:
Debian Release: 6.0.5
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.32-5-686 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#678519: general: after about 1 month of uptime, routing of IPv6 packets is no longer possible, and IPv4 routing becomes slow and unpredictable. Rebooting brings all functionality back, and back to

2012-06-22 Thread Rudy Zijlstra

On 22-06-12 15:04, Roberto C. Sánchez wrote:

On Fri, Jun 22, 2012 at 01:59:37PM +0200, Rudy Zijlstra wrote:

Package: general
Severity: important
Tags: ipv6

let system run with IPv4  IPv6 routing for about 1 month

IPv6 routing will start to fail
IPv4 routing becomes slow and unpredictable

no obvious causes visible in the system. top and friends do not show a cpu hog

a reboot will bring the system back to normal behaviour.


Could this be something to do with connection tracking?

Regards,

-Roberto
Both IPv4 and IPv6 are impacted, which have separate iptables. IPv6 
routing gets fully blocked, IPv4 goes slow and unpredictable.


How could i check any relation to connection tracking?

cheers,


Rudy



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org