Bug#583312: possible fix
Good morning Further hunting around, and may have found a solution. A little background : Adding the line ServerTimeout=120 to the [X-*-Core] section of /etc/kde4/kdm/kdmrc enabled the system to boot to a normal login screen. In comparison to while having the problem, where was counting 7 to 8 seconds, give or take, and then being dropped from the green nVidia logo back to the console, adding that line I was counting 12 to 15 seconds, and then being presented with the logon screen. Wanting to test boot order, I deleted the line from kdmrc, and went to /etc/rc2.d. The order of the scripts was : S01nvidia-kernel S01quemu-kvm S01speech-despatcher S14portmap S15nfs-common S17nvidia-glx Changed the S17nvidia-glx to S7nvidia-glx, and the beastie booted up to a login screen no problems, and at about the same speed (give or take) as before the installation of the initscripts and sysv... stuff yesterday. Now, this is not optimal - some update at some time in the future will again reorder the scripts, and the problem is very likely to repeat itself. If I could suggest a change to the way that the scripts that do the ordering of scripts in rc2.d to be altered so as to action nvidia-glx earlier in the boot sequence. Though the problem may not affect everyone, doing so may reduce or even eliminate the incidences of this issue in the future. With greetings Romane -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#583312: [Pkg-sysvinit-devel] Bug#583312: possible fix
[Romane] Changed the S17nvidia-glx to S7nvidia-glx, and the beastie booted up to a login screen no problems, and at about the same speed (give or take) as before the installation of the initscripts and sysv... stuff yesterday. As far as I know, changing the sequence number of a script will not affect parallel booting. Did you use S7nvidia-glx or S07nvidia-glx? If the former, I suspect this caused the script to not run at all during boot. I suspect there is some race issue causing some but not all boots to fail. Btw, what is the name of the package providing /etc/init.d/nvidia-glx on your machine (dpkg -S /etc/init.d/nvidia-glx). Now, this is not optimal - some update at some time in the future will again reorder the scripts, and the problem is very likely to repeat itself. Please try the adjusted header for nvidia-glx I posted earlier, and let me know if it helps. Happy hacking, -- Petter Reinholdtsen -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#583312: [Pkg-sysvinit-devel] Bug#583312: possible fix
Good morning Petter I must thank you for your patience. As far as I know, changing the sequence number of a script will not affect parallel booting. Did you use S7nvidia-glx or S07nvidia-glx? If the former, I suspect this caused the script to not run at all during boot. I suspect there is some race issue causing some but not all boots to fail. Time has proven you correct, and my approach a failure anyways (hangs head). I had used S7, but even changing it to S07 made no difference. Back now to what it started at - S17nvidia-glx. Basically, whatever went right earlier on is now not going right; back to being dropped to the console. Btw, what is the name of the package providing /etc/init.d/nvidia-glx on your machine (dpkg -S /etc/init.d/nvidia-glx). $ dpkg -S /etc/init.d/nvidia-glx nvidia-glx: /etc/init.d/nvidia-glx Installed from the repositories, but drivers downloaded from nVidia are affected also. Please try the adjusted header for nvidia-glx I posted earlier, and let me know if it helps. Made that change, and no change to the boot issue. Rebooted a number of times, to make sure not a one-off. So far, only thing that seems to get me through is to make that change mentioned in my earlier email to /etc/kde4/kdm/kdmrc with a timeout value of 120. After my earlier overconfident assurance that had possibly found a solution, won't say that have it fixed, but over the test boots just now done, each boot was successful at getting to the login screen. The time from when the screen goes blank to when am presented with the logon screen varies from 15 to 18 seconds (give or take). Before, it was tossing me to the console after about 12 to 15 seconds (give or take). On those grounds, at least things seem to be working, even if not as they should :) I have read in my searches that making this change is not the preferred method, but ... - I can also make a coffee while I wait (laughing). Have another machine which have been holding off making this update to. Took the plunge a little earlier, and it came up on the first boot. Also an nVidia card. My 3 machines are always set up identically - can hop from one to the other without having to remember which machine am sitting at - different hardware, but same system otherwise. So, not a consistent issue, as you suggested in your reply to me. The third machine is not affected - ATI, not using proprietary drivers. I have reached the end of my own options and limited knowledge (getting old and forgetful :)), but am most happy to use this machine to debug the issue with you if that will help improve further an already supurb distribution. Crashing it is not an issue - worst comes to worst, can reformat, reinstall (grinning). Have babbled sufficient for now Happy hacking, With greetings Romane -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#583312: [Pkg-sysvinit-devel] Bug#583312: possible fix
reassign 583312 nvidia-glx severity 583312 serious thanks [Romane] I have reached the end of my own options and limited knowledge (getting old and forgetful :)), but am most happy to use this machine to debug the issue with you if that will help improve further an already supurb distribution. Crashing it is not an issue - worst comes to worst, can reformat, reinstall (grinning). I had a look at the open bugs against nvidia-glx, and came across #521699 which seem similar to your problem. Reassigning to the nvidia-glx package to get input from the maintainers of that package, and because I believe the problem is in that package. Setting serverity to serious, based on the assumtion that this problem will affect all users with parallel booting now enabled by default. Does it work to add for example 'sleep 5' at the end of the start section in /etc/init.d/nvidia-glx? Perhaps something need more time before X is started? Happy hacking, -- Petter Reinholdtsen -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#583312: [Pkg-sysvinit-devel] Bug#583312: possible fix
Good morning Petter reassign 583312 nvidia-glx severity 583312 serious thanks Thanks for passing it on. Even if not all users, a sufficient portion probably. Have replied all in this email. I had a look at the open bugs against nvidia-glx, and came across #521699 which seem similar to your problem. Reassigning to the nvidia-glx package to get input from the maintainers of that package, and because I believe the problem is in that package. Setting serverity to serious, based on the assumtion that this problem will affect all users with parallel booting now enabled by default. Does it work to add for example 'sleep 5' at the end of the start section in /etc/init.d/nvidia-glx? Perhaps something need more time before X is started? You seem to have hit the nail on the head. Added that line to the script nvidia-glx, commented out the delay I had added in /etc/kde4/kdm/kdmrc, rebooted, and it came up to a normal logon screen without even displaying the green nVidia logo. I then went in and made sure everything was set back to what it was when this issue started for me - took out those two lines from the nvidia-glx script that were tried earlier, ensured that numbering was still S17nvidia-glx in rc2.d. Rebooted. 6 times. Each time without any errors, without seeing the nVidia logo, and boot was acceptably and perceptibly quicker than what was before even the update of the initscripts yesterday. Only things couldn't change was whatever changes running update-rc.d made earlier in the day (see history of problem). Checked the various logs, and was unable to see anything that may help. Anything that I can do that can help to isolate this further? Ran /usr/share/insserv/make-testsuite again, and have attached the output in case of any use. After the to and fro'ing during the course of the day, am inclined to accept your view now that the problem is in the nvidia package. Happy hacking, With greetings Romane set +C cat 'EOF' $insconf $local_fs +mountall +mountoverflowtmp +umountfs $network+networking +ifupdown $named +named +dnsmasq +lwresd +bind9 $network $remote_fs $local_fs +mountnfs +mountnfs-bootclean +umountnfs +sendsigs $syslog +rsyslog +sysklogd +syslog-ng +dsyslog +inetutils-syslogd $portmapportmap $time +hwclock interactive glibc udev console-screen keymap keyboard-setup console-setup cryptdisks cryptdisks-early checkfs-loop EOF set -C addscript acpid 'EOF' ### BEGIN INIT INFO # Provides: acpid # Required-Start:$remote_fs $syslog # Required-Stop: $remote_fs $syslog # X-Start-Before:kdm gdm xdm hal # X-Stop-After: kdm gdm xdm hal # Default-Start: 2 3 4 5 # Default-Stop: # Short-Description: Start the Advanced Configuration and Power Interface daemon # Description: Provide a socket for X11, hald and others to multiplex #kernel ACPI events. ### END INIT INFO EOF addscript atd 'EOF' ### BEGIN INIT INFO # Provides: atd # Required-Start:$syslog $time $remote_fs # Required-Stop: $syslog $time $remote_fs # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 # Short-Description: Deferred execution scheduler # Description: Debian init script for the atd deferred executions #scheduler ### END INIT INFO EOF addscript bootlogd 'EOF' ### BEGIN INIT INFO # Provides: bootlogd # Required-Start:mountdevsubfs # X-Start-Before:hostname keymap keyboard-setup procps pcmcia hwclock hwclockfirst hdparm hibernate-cleanup lvm2 # Required-Stop: # Default-Start: S # Default-Stop: # Short-Description: Start or stop bootlogd. # Description: Starts or stops the bootlogd log program #which logs boot messages. ### END INIT INFO EOF addscript bootlogs 'EOF' ### BEGIN INIT INFO # Provides: bootlogs # Required-Start:hostname $local_fs # Required-Stop: # Should-Start: $x-display-manager gdm kdm xdm ldm sdm wdm nodm # Default-Start: 1 2 3 4 5 # Default-Stop: # Short-Description: Log file handling to be done during bootup. # Description: Various things that don't need to be done particularly #early in the boot, just before getty is run. ### END INIT INFO EOF addscript bootmisc.sh 'EOF' ### BEGIN INIT INFO # Provides: bootmisc # Required-Start:$remote_fs # Required-Stop: # Should-Start: udev # Default-Start: S # Default-Stop: # Short-Description: Miscellaneous things to be done during bootup. # Description: Some cleanup. Note, it need to run after mountnfs-bootclean.sh. ### END INIT INFO EOF addscript checkfs.sh 'EOF' ### BEGIN INIT INFO # Provides: checkfs # Required-Start:checkroot # Required-Stop: # Should-Start: mtab # Default-Start: S # Default-Stop: # X-Interactive: true # Short-Description: Check all filesystems. ### END INIT INFO EOF addscript checkroot.sh 'EOF' ### BEGIN INIT INFO #
Bug#583312: possible fix
Romane rom...@miscellanie.com writes: I had a look at the open bugs against nvidia-glx, and came across #521699 which seem similar to your problem. Reassigning to the nvidia-glx package to get input from the maintainers of that package, and because I believe the problem is in that package. Setting serverity to serious, based on the assumtion that this problem will affect all users with parallel booting now enabled by default. I think parallel booting is a red herring. The init script for nvidia-glx has nothing to do with the operation of the X server (take a look at what it does). You seem to have hit the nail on the head. Added that line to the script nvidia-glx, commented out the delay I had added in /etc/kde4/kdm/kdmrc, rebooted, and it came up to a normal logon screen without even displaying the green nVidia logo. I then went in and made sure everything was set back to what it was when this issue started for me - took out those two lines from the nvidia-glx script that were tried earlier, ensured that numbering was still S17nvidia-glx in rc2.d. Rebooted. 6 times. Each time without any errors, without seeing the nVidia logo, and boot was acceptably and perceptibly quicker than what was before even the update of the initscripts yesterday. Only things couldn't change was whatever changes running update-rc.d made earlier in the day (see history of problem). If you're experiencing a variant of #521699, then the problem is that the timeout in KDM is too fast. You need to tell KDM to wait longer; it takes the NVIDIA driver longer to initialize the card than it's willing to wait for. I suspect that the only thing that parallel booting is doing is starting kdm sooner and hence giving the NVIDIA module even less time to initialize the hardware. See #568969 for the timeout fix that worked for GDM. It appears to no longer be a problem with GDM 3 (or at least it's not reproducible for us). However, it's possible that my understanding here is not complete. I don't see any obvious way that we can fix this on the NVIDIA side if I'm understanding the problem correctly. It takes as long as it takes to initialize the video card, and the nvidia-glx init script is superfluous and is going away, so adding delays to it won't work (and I'm skeptical that's a reliable solution anyway). -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#583312: possible fix
[Russ Allbery] If you're experiencing a variant of #521699, then the problem is that the timeout in KDM is too fast. You need to tell KDM to wait longer; it takes the NVIDIA driver longer to initialize the card than it's willing to wait for. I suspect that the only thing that parallel booting is doing is starting kdm sooner and hence giving the NVIDIA module even less time to initialize the hardware. What is loading the nvidia driver? When is it done? If it is done by some init.d script, the init.d script should not exit until the initialization is done to make sure those scripts depending on it will work. Happy hacking, -- Petter Reinholdtsen -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#583312: [pkg-nvidia-devel] Bug#583312: possible fix
Petter Reinholdtsen p...@hungry.com writes: [Russ Allbery] If you're experiencing a variant of #521699, then the problem is that the timeout in KDM is too fast. You need to tell KDM to wait longer; it takes the NVIDIA driver longer to initialize the card than it's willing to wait for. I suspect that the only thing that parallel booting is doing is starting kdm sooner and hence giving the NVIDIA module even less time to initialize the hardware. What is loading the nvidia driver? When is it done? It's loaded dynamically by the X server when it starts. These days, I believe that's done via the device mappings provided in the nvidia-kernel-common package, which alias char-major-195* to the nvidia kernel module, although I'm not deeply familiar with the details of how dynamic hardware initialization is handled. But the kernel module is not loaded until the X server is started, and it's loaded automatically at that point. If it is done by some init.d script, It's not, unless the mknod commands in the nvidia-kernel init script are doing some sort of deep magic that I'm fairly sure they're not. There's definitely no explicit call to modprobe anywhere in an init script provided by NVIDIA packages. -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org