Bug#1020290: init-system-helpers depends on usrmerge | usr-is-merged

2022-09-21 Thread Craig Sanders
reopen 1020290
severity 1020290 critical
stop

There's no need to repeat (again!) what i've already said.

fix the bug before closing this report.



Bug#1020290: closed by Ansgar (Re: Bug#1020290: init-system-helpers depends on usrmerge | usr-is-merged)

2022-09-21 Thread Craig Sanders
reopen 1020290
severity 1020290 critical
stop

> Please stop playing BTS ping-pong.

STOP CLOSING BUG REPORTS WITHOUT FIXING THE PROBLEM.

> Either way it looks like you misconfigured you apt sources as you said
> earlier:

NO. I DID NOT MISCONFIGURE MY SYSTEM.

IT DOESN'T MATTER HOW MANY TIMES YOU CLAIM IT HAS BEEN MISCONFIGURED, IT IS
NOT.  READ THE BUG REPORT.  PAY ATTENTION TO WHAT HAS BEEN STATED SEVERAL
TIMES NOW.  AT THE VERY LEAST DO YOUR OWN TESTS TO CONFIRM OR DISPROVE IT,
NOT JUST DISMISS IT WITH UNSUPPORTED STATEMENTS OF OPINION.

> # apt-get -d -u install usrmerge
> Reading package lists... Done
> Building dependency tree... Done
> Reading state information... Done
> Package usrmerge is not available, but is referred to by another package.
> This may mean that the package is missing, has been obsoleted, or
> is only available from another source
>
> That is obviously not a bug in the package. So find out why usrmerge is
> not available and then install it. It will probably also be picked up
> then.

THE SAME PROBLEM OCCURRED ON FIVE DIFFERENT SYSTEMS.

THE COPY-PASTE ABOVE IS FROM A BOG STANDARD SID VM, TOTALLY UNFUCKED WITH.

And, as I noted in a previous msg, the usrmerge package IS available,
"apt-cache show" finds it without a problem, but apt wants to install
usr-is-merged.

> But usrmerge does exist, it is available, there is an installation 
candidate:
>
># apt-cache show usrmerge
>Package: usrmerge
>Version: 30+nmu1
>Installed-Size: 39
>Maintainer: Marco d'Itri 
>Architecture: all
>Provides: usr-is-merged
>Depends: perl:any, libfile-find-rule-perl

It's not my responsibility to figure out what broken interactions your broken
package has with usrmerge, usr-is-merged, and apt itself.  IT'S YOUR PACKAGE,
YOUR RESPONSIBILTY.

craig

--
craig sanders 



Bug#1020290: init-system-helpers depends on usrmerge | usr-is-merged

2022-09-21 Thread Craig Sanders
On Wed, Sep 21, 2022 at 08:32:53AM +0200, Helmut Grohne wrote:
> On Wed, Sep 21, 2022 at 10:41:18AM +1000, Craig Sanders wrote:
> > Stop closing this bug without fixing it.
>
> The change you are objecting to was planned and presented to various
> teams in Debian including the technical committee and the release team.

This bug report is ABOUT A BROKEN PACKAGE that causes apt upgrade and
apt dist-upgrade etc to fail.

IT IS NOT ABOUT THE USR MERGE DECISION.  THAT IS ANOTHER ISSUE
ENTIRELY - IT HAPPENS TO BE THE ULTIMATE CAUSE OF THE BUG BEING
CREATED BUT THE BUG IS SEPARATE AND CAN BE FIXED WITHOUT REVISITING
ANY DECISIONS.

I'M NOT EXPECTING ANYONE TO REVERSE THE DECISION.

JUST FIX THE FUCKING PACKAGE SO IT DOESN'T BREAK APT.

> The change is implementing a technical committee decision and has been
> agreed to by the technical committee and the release team. You should be
> able to find the rationale in the relevant technical committee bug
> reports. If you want to change this, please file your objection with the
> technical committee or start a GR instead of reopening this bug over and
> over again.

The technical committee decision does not mandate an incompetent
implementation that breaks apt.

No matter how fucked up and stupid the decision was, it is possible to
implement it without breaking apt.

> > It is breaking upgrades and dist-upgrades etc.
>
> As others have pointed out, this is breaking on your side, because

These "others" you mention are making shit up because they don't want to even
investigate the problem, let alone fix it.

As I have pointed out several times now, THIS IS **NOT** SOMETHING JUST
HAPPENING ON MY SIDE DUE TO SOME ODDITY WITH MY SYSTEMS' CONFIGURATIONS.  MY
SYSTEMS ARE NOT MISCONFIGURED.  THE BUG OCCURS ON STANDARD DEBIAN SID VMS.

IF YOU THINK I AM WRONG, THEN DO SOME TESTING TO CONFIRM OR DISPROVE IT
YOURSELF.

> usrmerge is somehow prevented from being installable. Please figure out
> why that happens. I suppose the most likely cause would be use of
> dpkg-fsys-usrunmess. Refer to the new faq entry at
> https://wiki.debian.org/UsrMerge on how to revert that.

I have never run anything like that, so there is nothing to revert.  I was
barely even aware of this usr merge bullshit happening because it is not
important to me.

The ONLY reason I'm aware of it now is because, during a routing dist-upgrade,
init-system-helpers 1.65.2 broke upgrades on several of my systems, breakage
which was only fixable by reverting to 1.64.  AS WAS DESCRIBED IN MY INITIAL
BUG REPORT.

craig

--
craig sanders 



Bug#1020290: init-system-helpers depends on usrmerge | usr-is-merged

2022-09-20 Thread Craig Sanders
reopen 1020290
severity 1020290 critical
stop

> Given you have made usrmerge uninstallable in your system as you
> admitted (probably running one of dpkg's unsupported scripts that
> installs a local blocking package, I'd imagine) then it is entirely on
> you to fix that, it cannot be done remotely. Please stop spewing abuse
> and playing games with the BTS, it is not going to achieve anything.

I did not do that.  And I stated very clearly that I didn't.

You asked, and I replied:

> > did you block it locally somehow?
> Nope.  I had no idea this stealth-mandatory change was being forced.

stop twisting my words to avoid fixing your bug.

I said that I probably *would* (note: future tense, an indication of intent)
do something to block usrmerge on my systems somehow.

And I also said that it wasn't necessary to do so because your package was
broken.

I then gave another example of the problem occuring on a minimalist standard
sid VM.



Stop closing this bug without fixing it.

It is breaking upgrades and dist-upgrades etc.



Bug#1011146: (fwd) dlocate is marked for autoremoval from testing

2022-05-26 Thread Craig Sanders
and dlocate:

- Forwarded message from Debian testing autoremoval watch 
 -

Date: Thu, 26 May 2022 04:45:56 +
From: Debian testing autoremoval watch 
To: dloc...@packages.debian.org
Subject: dlocate is marked for autoremoval from testing

dlocate 1.10 is marked for autoremoval from testing on 2022-06-30

It (build-)depends on packages with these RC bugs:
1011146: nvidia-graphics-drivers-tesla-470: CVE-2022-28181, CVE-2022-28183, 
CVE-2022-28184, CVE-2022-28185, CVE-2022-28191, CVE-2022-28192
 https://bugs.debian.org/1011146



This mail is generated by:
https://salsa.debian.org/release-team/release-tools/-/blob/master/mailer/mail_autoremovals.pl

Autoremoval data is generated by:
https://salsa.debian.org/qa/udd/-/blob/master/udd/testing_autoremovals_gatherer.pl

- End forwarded message -


dlocate doesn't build-depend on nvidia-graphics-drivers-tesla-470 or anything
similar, or to anything even related to nvidia.  Here's it's debian/control
file:


$ cat debian/control
Source: dlocate
Section: utils
Priority: optional
Maintainer: Craig Sanders 
Standards-Version: 3.7.2.1
Build-Depends: debhelper (>= 13)

Package: dlocate
Architecture: all
Depends: dctrl-tools | grep-dctrl (>= 0.11), dpkg (>= 1.8.0), ${perl:Depends}, 
${misc:Depends}
Recommends: supercat
Description: fast alternative to dpkg -L and dpkg -S
 Uses GNU grep and text dumps of dpkg's data to greatly speed up finding
 out which package a file belongs to (i.e. a very fast dpkg -S). Many
 other uses, including options to view all files in a package, calculate
 disk space used, view and check md5sums, list man pages, etc.
Homepage: https://github.com/craig-sanders/dlocate



craig



Bug#880902: [Pkg-zfsonlinux-devel] Bug#880902: RC bug on zfs-linux that has to be fixed

2017-11-16 Thread Craig Sanders
On Thu, Nov 16, 2017 at 12:10:38PM +0100, Raphael Hertzog wrote:
> This bug should be quickly fixed because ZFS is broken in Debian Testing
> right now, spl-linux migrated already and zfs-linux did not migrate due to
> this bug.

This bug definitely needs to be fixed but it shouldn't be a big panic-inducing
deal for anyone following sane practice with the zfs packages anyway.

By "sane practice", I mean that the zfs packages should **always** be held
until you manually unhold them immediately prior to upgrading them, and then
immediately hold them again.

Ditto for linux-image-$arch and linux-header-$arch packages on any system that
needs dkms modules.

Why?

Because zfs-dkms and spl-dkms almost always need to be updated for new kernel
versions.  Just letting them upgrade automatically via apt-get upgrade or
dist-upgrade is a recipe for a broken system.  So far, while new kernels
almost always break older zfs packages, newer zfs packages tend to compile OK
on older kernelsbut that's not at all guaranteed to be the case.  It's a
pretty safe bet, but not one I'd be willing to take when it could mean being
unable to access my data or even boot my zfs-root systems.

Actually, that's true for most, if not all, -dkms packages.  So the safe thing 
to
do is to hold kernel and dkms packages.


BTW, I use the following script to list (default), verbose list (-v), hold
(-h), and unhold (-u) zfs related packages:


#!/bin/bash
#
# script: list-zfs.sh
    # author: Craig Sanders <c...@taz.net.au>
# license: Public Domain (this script is too trivial to be anything else)

# options:
# default/nonelist the installed ZoL packages, one per line
# -v  verbose (dpkg -l) list the packages
# -h  hold the packages with apt-mark
# -u  unhold the packages with apt-mark

# build an array of currently-installed zfs packages.
# this would be better with grep-status from dctrl-tools, but dpkg is
# guaranteed to be on every debian system while dctrl-tools isn't.
PKGS=( $(dpkg -l '*libnvpair*linux' '*libuutil*linux*' '*zfs*' '*zpool*' 
'spl' 'spl-dkms' 2>/dev/null | awk '/^.i/ {print $2}') )

if [ "$1" == "-v" ] ; then
  dpkg -l "${PKGS[@]}"
elif [ "$1" == "-h" ] ; then
  apt-mark hold "${PKGS[@]}"
elif [ "$1" == "-u" ] ; then
  apt-mark unhold "${PKGS[@]}"
else
  printf "%s\n" "${PKGS[@]}"
fi

I've got a similar script for nvidia related packages.  It's exactly the same 
except
for the PKGS array.


For kernels, I keep them on hold until I want to (manually) upgrade them:

apt-get -u install linux-headers-amd64 linux-image-amd64 ; \
  apt-mark hold linux-headers-amd64 linux-image-amd64



packages make systems administration easier. they're not a substitute for it.

craig

ps: this bug is an example of why testing is actually worse than sid for
real-world (non-testing) usage.  This bug report, while necessary, actually
delayed the migration of the updated zfs packages that would have resolved
it.  That's an unavoidable side-effect of bug reports against packages in sid.
IIRC, packages only get migrated from sid to testing if there hasn't been a
bug reported against them for 14 (? not sure exactly) days.


--
craig sanders <c...@taz.net.au>

BOFH excuse #112:

The monitor is plugged into the serial port



Bug#813339: breaking an essential service is bad enough, failure to document just compounds the error.

2016-02-08 Thread Craig Sanders
making changes to how things work is reasonable and perfectly normal.

completely failing to document those changes, resulting in broken
servers (and, given that this is a dhcp server, broken networks) is not.

there isn't even a single mention of this change ANYWHERE in the
updated package, not in the /etc/default file, and not under
/usr/share/doc/isc-dhcp-server.


/usr/share/doc/isc-dhcp-server# find . -type f -exec zgrep -i INTERFACESv[46] 
{} +
/usr/share/doc/isc-dhcp-server#


i had to discover this change myself by running the init.d script with
'bash -x' (good thing i hadn't jumped on the systemd bandwagon - shell
scripts are beneficial, not a hassle)...only then did i have enough info
to figure out what the problem was and enough info to search the bug
tracker.

so, please, if you're going to make major changes to how something works
then at least make some minimal effort to document the changes.

thanks,

craig

-- 
craig sanders <c...@taz.net.au>



Bug#807015: xinit: startx freezes, mouse and keyboard don't work

2015-12-04 Thread Craig Sanders
Package: xinit
Version: 1.3.4-3
Severity: grave
Justification: renders package unusable

sometime in the last ~70 days (since I last started X or rebooted),
something has changed in X that prevents startx from working as an
ordinary user.

startx *was* working perfectly.  Now when I run startx, I can see the
xfce desktop but neither keyboard nor mouse work at all, can't even
switch VT with Ctrl-Alt-F1 to kill X.  I have to login from another
machine on the network to kill X and get back to a text console.

seems to be something to do with the error message:

  xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted)


here's the entire log from 'startx > startx.log 2>&1':

X.Org X Server 1.17.3
Release Date: 2015-10-26
X Protocol Version 11, Revision 0
Build Operating System: Linux 3.16.0-4-amd64 x86_64 Debian
Current Operating System: Linux ganesh 3.19-5.dmz.1-liquorix-amd64 #1 ZEN SMP 
PREEMPT Debian 3.19-5 (2015-04-19) x86_64
Kernel command line: BOOT_IMAGE=/vmlinuz-3.19-5.dmz.1-liquorix-amd64 
root=/dev/md1 ro iommu=noagp iommu=noagp
Build Date: 27 October 2015  11:41:02PM
xorg-server 2:1.17.3-2 (http://www.debian.org/support) 
Current version of pixman: 0.33.4
Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/home/cas/.local/share/xorg/Xorg.1.log", Time: Fri Dec  4 
18:47:52 2015
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted)



-- System Information:
Debian Release: stretch/sid
  APT prefers unstable
  APT policy: (990, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.19-5.dmz.1-liquorix-amd64 (SMP w/6 CPU cores; PREEMPT)
Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Init: sysvinit (via /sbin/init)

Versions of packages xinit depends on:
ii  coreutils   8.23-4
ii  libc6   2.21-1
ii  libx11-62:1.6.3-1
ii  x11-common  1:7.7+12
ii  xauth   1:1.0.9-1

Versions of packages xinit recommends:
ii  lxde-common [x-session-manager]   0.99.0-2
ii  lxsession [x-session-manager] 0.5.1-2
ii  metacity [x-window-manager]   1:3.18.1-1
ii  openbox [x-window-manager]3.6.1-2
ii  roxterm [x-terminal-emulator] 3.2.1-1
ii  terminator [x-terminal-emulator]  0.97-4
ii  xfce4-session [x-session-manager] 4.12.1-3
ii  xfce4-terminal [x-terminal-emulator]  0.6.3-2
ii  xfwm4 [x-window-manager]  4.12.3-1
ii  xserver-xorg [xserver]1:7.7+12
ii  xterm [x-terminal-emulator]   320-1

xinit suggests no packages.

-- no debconf information



Bug#807015: xinit: startx freezes, mouse and keyboard don't work

2015-12-04 Thread Craig Sanders
On Fri, Dec 04, 2015 at 01:57:53PM +0100, Laurent Bigonville wrote:
> Are you using systemd? 

Nope, this particular machine is still sysvinit.

It also has over 20 years worth of cruft on it, as i first built it in
1994 and have continuously upgraded it (with debian unstable) ever since.

> is libpam-systemd installed on your machine?

Yes, because without it several important packages would be uninstalled,
including libvirt packages for some unfathomable reason - it's impossible to
have a completely non-systemd machine in debian, you can either have systemd
or you can have a hybrid of systemd + whatever else.  systemd, or at least
parts of it, is mandatory.


It is not enabled in /etc/pam.d/ though.  None of the files in there use
it.

and, yes, I have tried it with libpam_systemd enabled.  Makes no difference.


> If it's not the case, try to install xserver-xorg-legacy and look at
> Xwrapper.config man page

What good would that do?  What would it fix, and how? I am running
neither legacy drivers nor non-linux kernels.

startx worked without this until recently, i'd rather not digress into
installing and configuring random packages unless there's a good and
clearly defined reason for it.


craig

-- 
craig sanders <c...@taz.net.au>



Bug#807015: xinit: startx freezes, mouse and keyboard don't work

2015-12-04 Thread Craig Sanders
I have managed to get startx working again, but I'm not sure if it's the
correct way.  I added my user to the 'input' group after noticing the
following in /home/cas/.local/share/xorg/Xorg.1.log

[ 24633.131] (**) evdev: Dell Dell USB Keyboard: Device: "/dev/input/event1"
[ 24633.131] (EE) evdev: Dell Dell USB Keyboard: Unable to open evdev device 
"/dev/input/event1".

$ ls -l /dev/input/event1
crw-rw 1 root input 13, 65 Dec  4 11:57 /dev/input/event1

and similar for the mouse.


Doing this allows the mouse and keyboard to work with startx but it
seems a rather hackish and possibly completely incorrect solution.  I'm
going to have to do some research on where this input group came from
and whether it's appropriate for ordinary user accounts to be members.
I suspect it's a bad idea to do that.

I'm still getting the "xf86EnableIOPorts: failed to set IOPL for
I/O (Operation not permitted)" error message in startx.log - and
so far google has been absolutely useless in revealing what that's
about...several people have complained about it but no-one has come up
with a definitive reason for it.


craig

-- 
craig sanders <c...@taz.net.au>



Bug#657560: #657560: apt: ...i18n_Translation-en Encountered a section with no Package: header

2012-03-31 Thread Craig Sanders
On Tue, Mar 06, 2012 at 12:15:38PM +0100, David Kalnischkies wrote:
 No, i am saying that apt asks for Translation-en and the
 server response with Translation-en.bz2. APT knows that
 it can't request the bz2 file if bzip2 isn't available,
 but some mirrors try to be helpful and clever… See also
 http://en.wikipedia.org/wiki/Content_negotiation

 Only a few mirrors seem to be configured to act this way, at least
 de, de2 and us are not effected as far as i have tested. (which is
 why i was never hit by that bug after implementing the request for
 uncompressed files to support users with local mirrors which actually
 have these files available)

good explanation, thanks.

but even given the content negotiation issue, apt-get somehow recognises
that the Translation-en file is actually bzipped and tries to decompress
it. if bzip2 is installed, it succeeds. if not, it fails.

Also, without bzip2 installed, neither aptitude nor apt-cache or other
tools will have full descriptions available.  So users can't find out
the details on a package before they install it, and 'apt-cache search'
can't search on the non-existant full-text description.

IMO, bumping bzip2 to Recommends isn't any kind of an over-reaction.
without bzip2, the functionality of apt (and others) is impaired.



BTW, the content negotiation configuration must be a lot more common
than you think - i've tried three mirrors, and they all behaved the
same. one was the aarnet.edu.au mirror, and the other two were my
own mirrors at work and at home.  The latter two at least I know are
standard debian apache2 default installs (negotiation module enabled by
default, and without any content-neg customisations)

$ cat /etc/apache2/mods-enabled/negotiation.conf
IfModule mod_negotiation.c
#
# LanguagePriority allows you to give precedence to some languages
# in case of a tie during content negotiation.
#
# Just list the languages in decreasing order of preference. We have
# more or less alphabetized them here. You probably want to change this.
#
LanguagePriority en ca cs da de el eo es et fr he hr it ja ko ltz nl nn no pl 
pt pt-BR ru sv tr zh-CN zh-TW

#
# ForceLanguagePriority allows you to serve a result page rather than
# MULTIPLE CHOICES (Prefer) [in case of a tie] or NOT ACCEPTABLE (Fallback)
# [in case no accepted languages matched the available variants]
#
ForceLanguagePriority Prefer Fallback

/IfModule


craig

-- 
craig sanders c...@taz.net.au

BOFH excuse #242:

Software uses US measurements, but the OS is in metric...



--
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#612956: nvidia-glx requires xorg-video-abi-8.0 but new xserver-xorg-core provides xorg-video-abi-8

2011-02-18 Thread Craig Sanders
more detail on xorg-video-abi-8.0 vs xorg-video-abi-8

xserver-xorg-core 2:1.9.4-1 Provides both of them, but 2:1.9.4-2
Provides only xorg-video-abi-8:

Package: xserver-xorg-core
Version: 2:1.9.4-1
Provides: xorg-input-abi-11, xorg-input-abi-11.0, xorg-video-abi-8, 
xorg-video-abi-8.0

Package: xserver-xorg-core
Version: 2:1.9.4-2
Provides: xorg-input-abi-11, xorg-video-abi-8


nvidia-glx 260.19.21-1 depends on xorg-video-abi-8.0:

Package: nvidia-glx
Version: 260.19.21-1
Depends: libgl1-nvidia-glx (= 260.19.21-1), libglx-nvidia-alternatives, 
nvidia-kernel-260.19.21, xorg-video-abi-8.0 | xorg-video-abi-6.0 | 
xserver-xorg-core ( 2:1.7.7), libc6 (= 2.2.5)
Conflicts: fglrx-driver, nvidia-glx, nvidia-glx-legacy, 
nvidia-glx-legacy-173xx, nvidia-glx-legacy-71xx, nvidia-glx-legacy-96xx



so attempting to upgrade xserver-xorg-core to 1.9.4-2 fails to provide
nvidia-glx's required dependancies.



not sure if this bug belongs to nvidia-glx or xserver-xorg-core, but the 
solution
requires co-ordination between the two packages.


craig

PS: why does nvidia-glx conflict with itself in the Conflicts: line
above? i know it's been in the package since at least 195.36.31 without
causing problems, but it seems an odd conflict to have.

'aptitude why-not nvidia-glx' says that that is the problem (it's wrong,
the problem is the missing dependancy xorg-video-abi-8.0).

# aptitude why-not nvidia-glx
ih  nvidia-glx Conflicts nvidia-glx


-- 
craig sanders c...@taz.net.au



-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#568689: pysol was discontinued in 2004, pysolfc replaces it

2010-03-13 Thread Craig Sanders
on http://www.pysol.org/ 

  As of 2004 any work on PySol has stopped, and PySol is officially
  discontinued.

   Fortunately a number of enthusiastic people have continued from
   where I left off and have created the PySol Fan Club edition. Please
   contribute all your patches and enhancements to this new project.

   It has been very much fun creating this game, and I hope you will
   appreciate the result of my efforts. Share and enjoy!

the PySol Fan Club edition is at:

http://pysolfc.sourceforge.net/

and it is still actively developed. latest release was in Dec 2009.


it might be worthwhile replacing pysol with pysolfc. screenshots look
a little prettier than pysol, hopefully the playability is as good or
better.

alternatively, if it's possible, recompile pysol against the latest
python-tk (i have to Hold the pysol package to prevent the upgrade of
python-tk from removing pysol).

or both...there's probably no reason why pysol and pysolfc can't
co-exist if them gameplay differs substantially.


BTW, there are also updated card sets in PySolFC-Cardsets v.2.0.

craig

-- 
craig sanders c...@taz.net.au



-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#550233: arpwatch 2.1a15 NMU

2009-11-11 Thread Craig Sanders
On Wed, Nov 11, 2009 at 08:52:05PM +, Simon McVittie wrote:
 It appears that your previous NMU of arpwatch introduced a couple of RC bugs
 (#550233 and #552792, both Cc'd) that you might not be aware of. Could you
 have a look at them, please?

thanks for the heads-up, i wasn't aware of these bug reports. the bug
tracker doesn't CC the NMU email address.  perhaps it should.

i'll make some time on the weekend to fix them.

i'll probably just remove the conffile status from ethercodes.dat. it
doesn't update very often, and it's not that big a deal to be running
with an older version.


 (As an aside: NMUs that are new upstream versions should be versioned
 like 2.15a-0.1 to indicate that they're NMUs.)

and i'll change the version number to 2.15a-0.2


craig

-- 
craig sanders c...@taz.net.au



-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#546401: sysvinit/sysv-rc drops support for /etc/rc.boot

2009-09-12 Thread Craig Sanders
Package: sysvinit
Version: 2.87dsf-3
Severity: critical

from the changelog:

  * Drop execution of files in /etc/rc.boot from sysv-rc.  This feature
have been obsolete since before 1999.  Remove the rc.boot(5) manual
page from the source as well.

WTF?

WHY?

this bone-headed decision just left my entire network wide open to
the internet because my /etc/rc.boot/00firewall script didn't run
after rebooting to upgrade to kernel 2.6.31, and the flood of spambots
took down my mail server along with associated load-related problems
(hundreds of CRON jobs starved for CPU, rsyslog and named maxed out)

and it was only luck that one of my testing accounts (with an insecure
dictionary-word password) had /bin/false as the shell - otherwise the
machine would have been compromised via ssh.

Sep 12 20:44:21 taz sshd[21285]: Accepted password for USERNAME_CENSORED from 
70.90.124.130 port 57020 ssh2


similarly, my /etc/rc.boot/ scripts to mail dmesg to root, and to use
blockdev to setra on all my drives didn't run either.


where the hell else am i supposed to put such scripts?

/etc/rc.boot hasn't been OK for packages to use for years, but it is THE
location for local boot scripts to exist, with all the usual benefits
of being run by run-parts (e.g. files with . in them not executed).

it's listed in the Debian FAQ /usr/share/doc/debian/FAQ/debian-faq.en.txt.gz
at around line 3500:

 Then, for compatibility, it runs the files (except those with a
 `.'in the filename) in `/etc/rc.boot/' too.  Any scripts in the
 latter directory are usually reserved for system administrator use,
  ^
 and using them in packages is deprecated.


please revert this change, or at least provide an equivalent alternative.
you can't just take away useful - even vital - functionality like this
without warning.



flagged as critical because of the security problems this causes.

craig

-- 
craig sanders c...@taz.net.au



-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#542888: bind9 stops during upgrade. again.

2009-08-23 Thread Craig Sanders
On Sun, Aug 23, 2009 at 01:56:18PM +0200, Ond??ej Surý wrote:
  a working nameserver is a core, essential component of a functioning
  network. while named is down, *everything* on the network that uses that
  nameserver ceases to function correctly.
 
 That's why you should never have only one nameserver 127.0.0.1 in
 your /etc/resolv.conf. Configure one or two more remote nameservers
 and you're done.

i'm talking about a network being disrupted, not a single host.

the hosts DON'T have 127.0.0.1 in resolv.conf, they have the namesever.

 It's not critical severity - normal or wishlist would be more
 appropriate.

it is critical.  it breaks functionality for the entire local network.

 While I agree that it's not very convenient to have bind9 stopped for
 longer time periods, there are other ways how to prevent this kind of

it's not a matter of convenience. it's a matter of temporarily breaking
everything on the network that depends on DNS (which is pretty much
every network service or connection).


 outage. And it's entirely different from f.e. quagga, where restart
 could mean loss of connectivity at all. Which clearly doesn't happen
 with bind - you just loose ability to resolv DNS, when you have only
 one nameserver configured.

actually, there are multiple nameservers on the network. it doesn't help
as much as you might think. most, if not all, client resolver libraries
try nameservers in series, not in parallel. they don't try the second
or third resolver until the first times out. for some services, the DNS
timeout is longer than the connection or authentication timeout.

clearly you have no experience of dozens or hundreds of users
whinging at you saying the internet isn't working or the
file-server/intranet/web site/etc is dead because the first nameserver
in their dhcp-assigned resolv list is down and the NEXT nameserver isn't
queried until the first times out.

even if, for example, their browser has the intranet's IP cached, the
users still can't to the intranet because the login scripts can't find
the ldap server.

DNS is not just a convenience, it's a fundamental service that
everything else on the network depends on.

the bind9 package has worked reliably for many years with a restart
after upgrade rather than stop-early, start-late approach. there is no
reason for the current change in behaviour. it just maximises downtime.

craig

-- 
craig sanders c...@taz.net.au



--
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#542888: bind9 stops during upgrade. again.

2009-08-23 Thread Craig Sanders
On Mon, Aug 24, 2009 at 12:33:56AM +0200, Ond??ej Surý wrote:
 While I don't exactly share LaMont's reasoning on this bug (LaMont:
 even if you do upgrade daemons separately, underlying libraries will
 change anyway while the daemon is running, so you need another restart
 anyway when libraries change), I do agree that it's much safer just to
 stop the deamon, install new version of bind9 and library
 dependencies, and after that to run it again.

instead of alluding to mysterious and unspecified failures that *might*
happen if a daemon like bind9 is restarted in the postinst rather than
stopped early and started later, how about actually giving some concrete
examples of those failures?

name some real problems that prove that it is less safe to restart after
an upgrade than stop before, start after.

and then prove that those problem(s) (if they even exist) are worse 
than taking down DNS for the entire duration of the upgrade.


whenever this issue comes up for any daemon, there's *always* some
hand-waving about the horrible problems that might happen, but never
even one single factual example.

contrast that to the actual problems that have been described (and
experienced) when the name-server is killed early and started later.

a real-world, factual problem that actually happens beats a
hypothetical, never-yet-been-demonstrated problem every time.


craig

-- 
craig sanders c...@taz.net.au



--
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#542888: bind9 stops during upgrade. again.

2009-08-23 Thread Craig Sanders
On Sun, Aug 23, 2009 at 06:00:11PM -0600, LaMont Jones wrote:
 On Mon, Aug 24, 2009 at 07:47:59AM +1000, Craig Sanders wrote:
  i'm talking about a network being disrupted, not a single host. the
  hosts DON'T have 127.0.0.1 in resolv.conf, they have the namesever.

 I've gone back as far as 9.2.4, and haven't found anywhere that the
 bind9 package actually kept itself from stoping bind9 during the
 upgrade.  I do know that bind 8 had code in the scripts to do that.

i've already mentioned bug #453765.

that bug was closed on Fri, 13 Jun 2008 16:54:42 -0600 with the
following message signed by you:

 * Leave named running during update.  Closes: #453765


   It's not critical severity - normal or wishlist would be more
   appropriate.
 
  it is critical. it breaks functionality for the entire local
  network.

 Generally what I've seen done is a separate IP for the resolver, and
 during a scheduled upgrade, that IP is migrated to another host,
 specifically to avoid any disruption.

why jump through such bizarrely complicated and dangerous hoops to avoid
a non-problem?

bind9 has never had a single demonstrated problem with
restart-after-upgrade.

and what do you do if you're ssh-ed in to that machine on that IP
address? you can't change it, otherwise you kill your ssh connection.

(BTW, ssh along with pppd are the two classic examples of why it's
a really bad idea to stop-before-upgrade and start-after instead of
restart-after. for those packages, that not only maximises downtime
it can prevent logging in to fix any errors. named can too, if
hosts.allow/hosts.deny are configured to reject ssh from unknown
hostnames or to only allow connections from your domain).


  it's not a matter of convenience. it's a matter of temporarily
  breaking everything on the network that depends on DNS (which is
  pretty much every network service or connection).

 Everything on the network that depends on this host for DNS.

which is, as i said, pretty much every network service or connection.

 Migrating the service to another host for the duration of the upgrade,
 or doing micro upgrades is certainly an option.

an even better option is for the bind9 package to actually make some
effort to minimise downtime by restarting after upgrade rather than
stopping early and starting late.

I'm still waiting to hear of any of the alleged problems caused by
restarting after upgrade that have been alluded to and hinted at
so mysteriously. so far, not one has even been theorised let alone
demonstrated to have happened.


and i can not fathom why there's so much resistance to just adding
'--restart-after-upgrade' to the call to dh_installinit.  it's a very
simple fix with no downside.


  the bind9 package has worked reliably for many years with a restart
  after upgrade rather than stop-early, start-late approach. there is no
  reason for the current change in behaviour. it just maximises downtime.
 
 Versions?

i don't know the version numbers, nor do i have an archive of every
historical release of the bind9 package to go searching through. i can
give you approximate dates instead:

it was fine for years before i reported the problem the first time in
Dec 2007 (bug #453765), and has been fine since that bug was closed in
June 2008 up until the recent change that re-introduced the bug.

craig

-- 
craig sanders c...@taz.net.au



-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#542888: bind9 stops during upgrade. again.

2009-08-22 Thread Craig Sanders
On Fri, Aug 21, 2009 at 10:35:13PM -0600, LaMont Jones wrote:
 On Sat, Aug 22, 2009 at 10:47:16AM +1000, Craig Sanders wrote:
  when upgrading bind9, named is stopped until the upgrade is completed. 
 
 And there is no real way to make sure it stays working while the
 libraries and such are changed out from under it, based on some of the
 issues we ran into earlier in the bind9 train, which caused me to quit
 trying to keep it running during a dist-upgrade.

what issues?

i've never had a problem with bind restarting after upgrade in all the
i'years ve been running it and upgrading it.

the ONLY time i've ever had serious problems upgrading bind is when it
has had this bug, taking down the entire network for the duration of the
upgrade.

  this could be a LONG time during an apt-get
  {dist-,dselect-,}upgrade, especially if there are many packaged
  being upgraded or if there are any debconf questions waiting to be
  answered.

 And more reason to upgrade daemons in separation from the rest of the
 machine.

which daemons in particular?

on most of the machines i run bind on, daemons are pretty much ALL that
they run.

in effect, you're saying don't bother with apt-get dist-upgrade, just
upgrade each package individually

part of the reason for running debian is so you don't have to remember
stupid crap like upgrade bind before postfix unless you're running
squid, in which case you need to upgrade this first and then that. and
then hope that the sequence hasn't changed since the last time because
some new dependancy has been introduced.

that's what dependancies and pre-depends are for.




  this has an obvious seriously detrimental and prolonged effect on
  the entire local network which depends on that nameserver.

  this is a repeat of an earlier bug (#453765), which was reported in
  Dec 2007 and fixed in June 2008.  It has come back.

 And if I recall, it was intentionally reintroduced to fix an issue
 with stopping the old daemon after the upgrade failed in a manner that
 made it seem to be a bad thing to do.

huh?  AFAICT, it is only the most recent version of bind9 that has
re-introduced the bug (at least, i hadn't noticed it until today). if
you can't recall why such a recent change was made, then perhaps there
wasn't a good reason for it.

  see also bug #471060 (debhelper, reported Mar 2008, fixed May 2008).
  a '--restart-after-upgrade' option was added to dh_installinit to
  provide a fix this behaviour.  There was some suggestion that this
  might become the default behaviour but it looks like that hasn't
  happened.

 I'll have to revisit the whole issue, but the switch to using
 dh_installinit was done separate from

the first time around for this bug bind9 was already using dh_installinit.

 the don't bother trying to keep it running through the upgrade
 decision.

that decision equates to we don't give a damn about our users'
networks.

a working nameserver is a core, essential component of a functioning
network. while named is down, *everything* on the network that uses that
nameserver ceases to function correctly.

minimising downtime for core network services is, and should be, a priority.

craig

-- 
craig sanders c...@taz.net.au



-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#542888: bind9 stops during upgrade. again.

2009-08-21 Thread Craig Sanders
Package: bind9
Version: 1:9.6.1.dfsg.P1-3
Severity: critical
Justification: breaks unrelated software


when upgrading bind9, named is stopped until the upgrade is completed. 

this could be a LONG time during an apt-get {dist-,dselect-,}upgrade,
especially if there are many packaged being upgraded or if there are any
debconf questions waiting to be answered.

this has an obvious seriously detrimental and prolonged effect on the
entire local network which depends on that nameserver.


this is a repeat of an earlier bug (#453765), which was reported in Dec 2007
and fixed in June 2008.  It has come back.

see also bug #471060 (debhelper, reported Mar 2008, fixed May 2008). a
'--restart-after-upgrade' option was added to dh_installinit to provide
a fix this behaviour.  There was some suggestion that this might become
the default behaviour but it looks like that hasn't happened.






-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (990, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.30.4 (SMP w/4 CPU cores; PREEMPT)
Locale: LANG=en_AU, LC_CTYPE=en_AU (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/bash

Versions of packages bind9 depends on:
ii  adduser  3.110   add and remove users and groups
pn  bind9utils   none  (no description available)
ii  debconf [debconf-2.0]1.5.27  Debian configuration management sy
pn  libbind9-50  none  (no description available)
ii  libc62.9-25  GNU C Library: Shared libraries
ii  libcap2  1:2.16-5support for getting/setting POSIX.
ii  libdb4.7 4.7.25-7Berkeley v4.7 Database Libraries [
pn  libdns50 none  (no description available)
ii  libgssapi-krb5-2 1.7dfsg~beta3-1 MIT Kerberos runtime libraries - k
pn  libisc50 none  (no description available)
pn  libisccc50   none  (no description available)
pn  libisccfg50  none  (no description available)
ii  libldap-2.4-22.4.17-1OpenLDAP libraries
pn  liblwres50   none  (no description available)
ii  libssl0.9.8  0.9.8k-4SSL shared libraries
pn  libxml2  none  (no description available)
ii  lsb-base 3.2-23  Linux Standard Base 3.2 init scrip
ii  net-tools1.60-23 The NET-3 networking toolkit
ii  netbase  4.37Basic TCP/IP networking system

bind9 recommends no packages.

Versions of packages bind9 suggests:
pn  bind9-doc none (no description available)
pn  dnsutils  none (no description available)
pn  resolvconfnone (no description available)
pn  ufw   none (no description available)

-- debconf information excluded



-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#501755: oocalc/writer/etc won't start. unexpected token `fi' on line 367 of soffice

2008-10-10 Thread Craig Sanders
On Fri, Oct 10, 2008 at 08:23:26AM +0200, Rene Engelhard wrote:
 Already reported a few times and pending. What if you would look in the BTS 
 first
 before filing the fifth instance of this?

i did look.  didn't see anything resembling it, so submitted a bug report.

craig

-- 
craig sanders [EMAIL PROTECTED]



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#471060: Processed: 'dh_installinit -n' does not fix the problem.

2008-04-25 Thread Craig Sanders
On Fri, Apr 25, 2008 at 01:38:39PM -0400, Joey Hess wrote:
 AFAIK, start-stop-daemon still defaults to refusing to stop a daemon if
 the executable has changed from the executable it initially started. Ie:

still defaults - since when?  i've never seen that behaviour.

start-stop-daemon uses pidfiles (e.g. /var/run/dictd) to kill the
running daemon.


 [EMAIL PROTECTED]:/usr/sbin/etc/init.d/dictd start
 Starting dictionary server: dictd.

your example seems to be specific to dictd, not to all daemons. that
indicates a bug in dictd or in dictd's init.d script, not a general
problem with start-stop-daemon.


i've never seen anything like that problem with other packages that
restart in the postinst.


e.g. one package that the maintainer recently fixed to work around the
bug in dh_installinit is rsyslog.

example one: replace /usr/sbin/rsyslogd by a copy of itself:

ganesh:/usr/sbin# ps aux | grep rsyslogd
root 23408  6.0  0.1  77968  5056 ?Sl   09:48   0:00 
/usr/sbin/rsyslogd -c3 -staz.net.au
root 23419  0.0  0.0   5320   748 pts/16   R+   09:48   0:00 grep rsyslogd

ganesh:/usr/sbin# mv rsyslogd rsyslogd.orig
ganesh:/usr/sbin# cp -af rsyslogd.orig rsyslogd
ganesh:/usr/sbin# invoke-rc.d rsyslog restart
Stopping enhanced syslogd: rsyslogd.
Starting enhanced syslogd: rsyslogd.

ganesh:/usr/sbin# ps aux | grep rsyslogd
root 23471  6.0  0.1  77956  5056 ?Sl   09:48   0:00 
/usr/sbin/rsyslogd -c3 -staz.net.au
root 23483  0.0  0.0   5320   748 pts/16   S+   09:48   0:00 grep rsyslogd


that works.  PID of rsyslogd changed from 23408 to 23471.  successfully 
restarted.



example two: replace rsyslogd with /bin/echo:

NOTE: /bin/echo is a better program to use for this experiment than
  /bin/ls because it doesn't complain about args it doesn't
  understand and then die with a non-zero exit code.

ganesh:/usr/sbin# cp -af /bin/echo rsyslogd
ganesh:/usr/sbin# invoke-rc.d rsyslog restart
Stopping enhanced syslogd: rsyslogd.
Starting enhanced syslogd: rsyslogd-c3 -staz.net.au
.
ganesh:/usr/sbin# ps aux | grep rsyslogd
root 24486  0.0  0.0   5320   716 pts/16   R+   09:51   0:00 grep rsyslogd


that also works. you can see the output of /bin/echo on the end of the
Starting... line.



example three: move the original rsyslogd back into place

ganesh:/usr/sbin# mv rsyslogd.orig rsyslogd
ganesh:/usr/sbin# invoke-rc.d rsyslog restart
Stopping enhanced syslogd: rsyslogd already stopped.
Starting enhanced syslogd: rsyslogd.

ganesh:/usr/sbin# ps aux | grep rsyslogd
root 24563  0.0  0.1  77968  5076 ?Sl   09:51   0:00 
/usr/sbin/rsyslogd -c3 -staz.net.au
root 24927  0.0  0.0   5320   716 pts/16   R+   09:55   0:00 grep rsyslogd


once again, that works.  start-stop-daemon doesn't care that the binary
has changed since it was originally started.


THAT is the typical behaviour of start-stop-daemon during an upgrade.

there are numerous daemons within debian that already work around
dh_installinit's bug in order to do the right thing and restart in the
postinst.

 



 Therefore if I were to take your advice that this bug is somehow RC, and
 change debhelper as you suggest today, I would in fact completly break
 upgrades of the majority of daemons in Debian.

no, it shows that dictd or the dictd package is already broken in some way.



 dh_installinit -n can be used to ditch debhelper's default init script
 starting code, and put in init script code that is tuned to the particular 
 daemon.

it is the exception that requires unusual handling, NOT the general
case.

dictd is an exception, not the rule. it requires special handling (or
bug fixing).

the general case is that daemons can and should continue to work
smoothly during upgrade, do not exhibit stop/start/restart problems
if the daemon binary is changed, and should just be restarted in the
postinst.

the point is that the rsyslog maintainer shouldn't have had to work
around this bug in dh_installinit. he's using the tool to make
package maintainence and conformance to policy/standards easier and
semi-automatic. dh_installinit should do the right thing for the
general, most common, case and still allow manual overrides/workarounds
for special cases.

instead, it currently does the wrong thing for the general case,
handles the exception as if it is the rule, and requires manual
overrides/workarounds to properly handle the general case. i.e. the
reverse of what it should do.


craig

-- 
craig sanders [EMAIL PROTECTED]



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#471051: rsyslog shuts down too early during upgrade

2008-03-18 Thread Craig Sanders
On Tue, Mar 18, 2008 at 04:21:55AM +0100, Michael Biebl wrote:
 You could summarise it like:
 Either optimise for minimal downtime or maximum safety.
 
 Your preference seems to be the former, mine the latter.

no, your straw-man dichotomy is nothing like what i have been saying.

your preference guarantees maximum downtime and maximum data loss.

mine (i.e. the correct behaviour, restart in postinst) minimises both.


i've experienced and reported ACTUAL data loss from the current upgrade
behaviour of rsyslog. a couple of hours worth of syslog data from
several machines gone - never logged to disk because rsyslog was down
during the upgrade, and the upgrade was waiting for me to answer a
question about another package's config files. data that would NOT have
been lost if rsyslog had done the right thing and stayed up until it was
restarted in the postinst.

you've posited a far-fetched HYPOTHETICAL scenario where rsyslog MAY
lose data due to some unexplained circumstances and some hand-waving
about incompatible data files (which rsyslog doesn't even have).
possibly confusing it to the point that it crashes

practice trumps theory, and reality trumps imagination.

craig

-- 
craig sanders [EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#471060: debhelper: dh_installinit's default behaviour is broken

2008-03-18 Thread Craig Sanders
On Mon, Mar 17, 2008 at 09:22:53PM +0100, Stefan Fritsch wrote:
  dh_installinit's default behaviour should be to not stop daemons,
  but to issue a reload, restart, or force-restart in the .postinst.
 
 JFTR, doing a simple reload on upgrade is _always_ wrong. After an 
 upgrade the new executables must be running, not the old ones. For 
 example, the old executables might still contain security issues.

true.

a restart in the postinst, not reload, is the appropriate action.

 And the current debhelper behaviour is the safer way and should be the 
 default. A daemon crashing because some of its data files no longer 
 fit to the running process can cause much more severe data loss than 
 just some down time. 

that is the appropriate behaviour ONLY for those particular daemons
where that actually happens.

for daemons like bind9 and rsyslog and most others, it is the wrong
behaviour - it maximises downtime, disrupts the entire network (for
named) and causes irreplacable data loss (for rsyslog).

if downtime is unavoidable, then it's unavoidable - nothing can be done
about it, it just has to be accepted. but where it IS avoidable, then it
SHOULD be avoided...and, in either case, downtime should be minimised.



if there are a known set of packages which have this behaviour then
either (a future version of) apt or dpkg could prioritise them (perhaps
via some tag in the control file) so that they and their
dependancies are unpacked and configured before other unprioritised
packages.  

or, even without any explicit support in apt or dpkg, a local sysadmin
could write a shell script to upgrade just those daemon packages first
before doing a full dist-upgrade. or upgrade them manually (which is
what i have to do now for bind9. and squid. and now also for rsyslog).

but that's only possible if it's a relatively small set of known
problematic packages.  it becomes impossible - pointless - if ALL
daemons stop then start later, rather than just restart.

why pointless? because the point is to upgrade and configure those
problematic packages as quickly as possible, to minimise downtime. if
all daemon packages do the wrong thing, then the end result would be no
different to just doing a dist-upgrade.



BTW, most daemon packages don't have lots of data files open. most
just read a config file at startup (or on a HUP signal), and open a
log file or socket. 

one that usually does have many data files open, postfix, restarts in
the postinst...and has done so for many years without problem.



 And a maintainer would have to check on every upload if and for which
 old versions it is safe not to stop the daemon before replacing the
 files. This would doubtless lead to many other bugs.

the maintainer could do it in such a complicated manner, but it's not
necessary.

most such problems occur when a significant new version is released
(e.g. a major version, not just a minor point release). the package
maintainer could stop and then start if the old version was = a
particular version number, or just restart if it wasn't.

a simpler generic method that would works reasonably well is that the
maintainer would just have to know whether there was any risk of their
daemon package having such problems. if there was a risk, then stop in
prerm, start in postinst. otherwise, just restart in the postinst.

in any case, it's only necessary for those daemon packages where there
IS such a problem or risk - it's not at all necessary for most daemon
packages.  a simple restart in the postinst will suffuce.


craig

-- 
craig sanders [EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#471051: rsyslog shuts down too early during upgrade

2008-03-18 Thread Craig Sanders
On Tue, Mar 18, 2008 at 04:01:05AM +0100, Michael Biebl wrote:
 Craig Sanders wrote:
  On Sun, Mar 16, 2008 at 03:14:58AM +0100, Michael Biebl wrote:
 
   Second, if you replace files while the daemon is still running,
   this can lead to all sorts of subtle failures, e.g. daemons that
   dynamically load functionality via shared modules (as rsyslog does)
   might crash.
  
  'MIGHT crash' is a whole lot better than 'definitely WILL be shut down
  for the entire duration of the upgrade - many minutes or even hours(*)'.
  
  with the former you have a chance of significant downtime during
  upgrade.
  
  with the latter, you are guaranteeing significant downtime during
  upgrade.
 
 The difference is, that a crashing daemon might lead to data corruption,
 which is much worse than a slightly longer downtime.

1. with rsyslog - or any other daemon that writes to sequential text
files - data loss is as bad as data corruption. and rsyslog's current
behaviour during upgrade guarantees long duration downtime and long
duration data loss.

a guarantee of data loss is worse than just the risk of same.


2. you seem determined to deliberately miss the point.

i am NOT saying that it is *ALWAYS* bad to stop a daemon in the prerm
and start it in the postinst.

i am saying that it is bad to do it in cases where there is no
demonstrated need. and I am saying that it is bad to do that simply
because it is the default behaviour of debhelper's dh_installinit
without even thinking about it or, worse, because logical deduction
based on false premises (that most daemons are at risk of this problem)
leads you to a false conclusion (that stopping early and starting late
is the correct default behaviour).

for the tiny minority of daemons where there is risk of data loss if the
upgrade is not performed in a particular manner, it is entirely
appropriate to do whatever is necessary to ensure safe and reliable
upgradeincluding stop-early,start-late.

for the vast majority of daemons, where there is no such risk, the
correct behaviour is to just restart in the postinst as that will
minimise downtime and, in cases like rsyslog, minimise loss of data.

not only is it the correct behaviour, it should be the default behaviour
- simply because it IS the correct behaviour in all but a handful of
exceptional cases.



 FWIW, if it is correct, that postfix behaves the way you describe, than
 this is broken.

no, it's not.

postfix does the right thing.


 [ mistaken understanding deleted ]
 The only reasonable and safe choice is, to
 stop postfix in prerm before those other services.

1. if you don't know how mail works, or how postfix works, then you really
shouldn't claim to know what is the only reasonable and safe choice.

2. there's no way of guaranteeing that postfix will be upgraded before,
say, mysql or postgresql without making those packages Pre-Depend on
postfix. which would be absurd, most people who use either db don't use
it for postfix.

e.g. in a dist-upgrade, mysql is lower in sort order so will likely
upgrade before postfix. postgres is slightly higher than postfix, so
will likely upgrade after postfix.

3. for those users who have created such inter-dependancies on their own
system, they know - or should know - the correct order to upgrade the
packages that are important to them. they have specific needs, so it
is up to them to handle them appropriatelybut it is wrong to force
ordinary users who don't have such specific needs to jump through those
same hoops.

craig

-- 
craig sanders [EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#471051: rsyslog shuts down too early during upgrade

2008-03-15 Thread Craig Sanders
Package: rsyslog
Version: 2.0.3-1
Severity: critical
Justification: causes serious data loss


rsyslog shuts down at the start of the upgrade and only gets restarted
when the package is configured.

this is broken. instead of stopping rsyslog in the prerm and starting it
in the postinst, just restart it in the postinst. actually, according
to the comments in the scripts, it looks like the stop and start are
inserted by dh_installinit.  i don't know whether that's because the bug
is actually in dh_installinit (debhelper package) or whether it's
because your package does not use the right args to dh_installinit.


here's the problem, /var/lib/dpkg/info/rsyslog.prerm:

#!/bin/sh
set -e
# Automatically added by dh_installinit
if [ -x /etc/init.d/rsyslog ]; then
if [ -x `which invoke-rc.d 2/dev/null` ]; then
invoke-rc.d rsyslog stop || exit $?
else
/etc/init.d/rsyslog stop || exit $?
fi
fi
# End automatically added section




in a dist-upgrade, it could be many minutes - or even hours if the
sysadmin doing the upgrade is doing other work simultaneously - between
the start of the upgrade and the time that rsyslog is configured,
especially if there are any questions waiting to be answered during the
upgrade.

while rsyslog is down, all log entries for the system being upgraded are
lost, along with all log entries for any hosts which use that system
as the syslog host.



for example:

Mar 15 23:10:28 ganesh rsyslogd: [origin software=rsyslogd
swVersion=2.0.2 x-pid=4493 x-info=http://www.rsyslog.com;] exiting
on signal 15.

Mar 16 00:37:31 ganesh rsyslogd: [origin software=rsyslogd
swVersion=2.0.3 x-pid=17184
x-info=http://www.rsyslog.com][x-configInfo udpReception=Yes
udpPort=514 tcpReception=No tcpPort=0] restart






-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.24 (SMP w/2 CPU cores; PREEMPT)
Locale: LANG=en_AU, LC_CTYPE=en_AU (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/bash

Versions of packages rsyslog depends on:
ii  libc6  2.7-9 GNU C Library: Shared libraries
ii  lsb-base   3.2-4 Linux Standard Base 3.2 init scrip
ii  zlib1g 1:1.2.3.3.dfsg-11 compression library - runtime

Versions of packages rsyslog recommends:
ii  logrotate 3.7.1-3Log rotation utility

-- no debconf information



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#471060: debhelper: dh_installinit's default behaviour is broken

2008-03-15 Thread Craig Sanders
Package: debhelper
Version: 6.0.8
Severity: critical
Justification: causes serious data loss


dh_installinit adds code to a package's .prerm script to stop the daemon
at the beginning of the upgrade, and then adds code to restart it in the
.postinst script.

this results in important system daemons (including bind9 and rsyslog)
being stopped for the entire duration of the upgrade - which could be a
very long time in a dist-upgrade.

in the case of bind9, that means no DNS for the entire network it
is serving, which effectively breaks most network activity for that
network.

in the case of rsyslog, that means no logging for the host and for any
other hosts/devices which use it as a syslog host - causing serious loss
of irreplacable data.

in the case of other daemons, whatever service they are providing 
is shut down for the duration of the upgrade.


dh_installinit's default behaviour should be to not stop daemons,
but to issue a reload, restart, or force-restart in the .postinst.

better yet, have no default and force the package maintainer using
dh_installinit to use their brains and their knowledge of their package
to decide which is the most appropriate behaviour for their particular
daemon. exit with an error message if dh_installinit is used without
specifying stop  reload/restart behaviour for prerm and postinst.



see bug numbers 453765 (bind) and 471051 (rsyslog) for more info.



-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.24 (SMP w/2 CPU cores; PREEMPT)
Locale: LANG=en_AU, LC_CTYPE=en_AU (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/bash

Versions of packages debhelper depends on:
ii  binutils2.18.1~cvs20080103-1 The GNU assembler, linker and bina
ii  dpkg-dev1.14.16.6package building tools for Debian
ii  file4.23-2   Determines file type using magic
ii  html2text   1.3.2a-3 An advanced HTML to text converter
ii  man-db  2.5.1-3  on-line manual pager
ii  perl5.8.8-12 Larry Wall's Practical Extraction 
ii  po-debconf  1.0.12.1 manage translated Debconf template

debhelper recommends no packages.

-- no debconf information



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#471051: rsyslog shuts down too early during upgrade

2008-03-15 Thread Craig Sanders
On Sun, Mar 16, 2008 at 01:16:22AM +0100, Michael Biebl wrote:
 Craig Sanders wrote:
  severity 471051 critical
  thanks
  
  this bug can not be downgraded to normal because it is a bug that
  results in a loss of important and irreplacable data. that is one of the
  defining characteristics of a critical bug. loss of logging also has
  security implications, which is also a critical issue.
 
 Do you want to play bts ping-pong?

nope, but a critical bug a critical bug.


  On Sat, Mar 15, 2008 at 03:32:00PM +0100, Michael Biebl wrote:
  Craig Sanders wrote:
  Package: rsyslog
  Version: 2.0.3-1
  Severity: critical
  Justification: causes serious data loss
 
 
  rsyslog shuts down at the start of the upgrade and only gets restarted
  when the package is configured.
 
  this is broken. instead of stopping rsyslog in the prerm and starting it
  Well, this is standard behaviour on Debian: daemons are stopped before
  upgrade and started after the upgrade.
  
  no, it is not and never has been standard behaviour on debian.
 
 You seem to be utterly mistaken here:

again, nope. i've been in debian a lot longer than you have. i know what
the behavior is, was, and should be.

AFACIT, it is only packages that misuse dh_installinit, like yours
does, which exhibit this broken behaviour.  other packages, where
the maintainers write their own pre/post scripts do not.

 grep invoke-rc.d .* stop *.prerm

1. the fact that some other packages have the same or similar broken
behaviour as yours does not excuse or justify your package being broken.


2. yes, i had the same initial thought, of doing a simple grep to find
packages which stop the daemon in the prerm script.

but, as is obvious from a little bit more thought about it, a simple
grep won't tell you whether the stop command is wrapped inside an
if/then or case statement, will it? so it tells you nothing useful.

you have to actually examine each script to find out whether it is
only stopping the daemon on a remove/purge (in which case, running the
init.d script or invoke-rc.d to stop the daemin is correct behavior), or
whether it's doing it all the time. in which case, it's broken behaviour
- see point 1 above.

  i've been a DD for over 10 years, and it's only recently that some
  daemons have been doing this on upgrade rather than reloading/restarting
  in the postinst. in fact, most still do just a reload/restart.
  
  there is NO NEED WHATSOEVER to stop a daemon just because it is being
  upgraded.
 
 Please don't shout.

it's emphasis, not shouting.

shouting is when entire lines or paragraphs are all-caps.

emphasis is when only certain words or phrases are all-caps.


  postfix, like MOST other daemon packages i use, also doesn't. postfix's
 
 Wrong, most packages use the way I described.

some packages are broken in the way that has been described.

that doesn't make it right.

that just means that there are other broken packages.

  One of the reasons, why it is done this way, ist that you can't
  guarantee a clean shutdown of the daemon, when you have replaced it's
  files, while it is still running.
  
  name one daemon where that's a case.
 
 I already gave you one.

no, you didn't.  you made an assertion without any evidence, not even a
single example.

i'll ask again:

which daemon is it that MUST be shutdown at the start of the upgrade and
only started again in the postinst?

note that MUST there. i.e. that it wont work any other way. that's not
the same as it's easier for me to do it this way or i didn't think
about it and am just using the default behaviour of a packaging helper
tool


if you can find one package that MUST do it this way, then that's
a special case that must be handled appropriately by the package
maintainer. it's not a reason for doing it that way for every other
package.


  the correct thing to do is to handle that as a special case in the
  postinst script - check if rklogd is running, stop it if it is, and then
  restart rsyslogd as normal.
 
 Great idea, let's make our maintainer scripts an unmaintainable mess of
 special cases.

1. it's not an unmaintainable mess. the postfix package has been
maintained for years without exhibiting the broken behaviour of yours.
on an upgrade, it keeps working without pause right up until the
postinst restarts it. downtime is the absolute minimum required to
restart postfix, anywhere from a fraction of a second (on lightly loaded
systems with simple config) to a few seconds (on heavily loaded systems
with complex postfix configuration). that's what daemon packages SHOULD
do.

(BTW, i know from personal experience that this is the case. i
maintained the postfix-tls package for over a year back when encryption
stuff couldn't go in main due to insane US crypto-export laws.
encryption support has now been merged into the main postfix package,
and postfix-tls has been dropped because it's no longer needed.)

in fact, many other packages do it the same way that postfix does - the
postfix's prerm

Bug#471051: rsyslog shuts down too early during upgrade

2008-03-15 Thread Craig Sanders
On Sun, Mar 16, 2008 at 03:00:49AM +0100, Michael Biebl wrote:
 Craig Sanders wrote:
  On Sun, Mar 16, 2008 at 01:16:22AM +0100, Michael Biebl wrote:
 
  
  grep invoke-rc.d .* stop *.prerm
  
  1. the fact that some other packages have the same or similar broken
  behaviour as yours does not excuse or justify your package being broken.
 
 You are talking about some, while it clearly is the majority of the
 packages that uses this behaviour. If you think this is is wrong, this
 should be generally discussed (e.g. on debian-devel).
 You intentionally try to hide facts.

i am not hiding facts, intentionally or otherwise.

please confine yourself to discussing the issue rather than making
irrelevant personal attacks.  i assure you that you do not want me
to start making personal attacks on you.  i will be much better at that
than you are if i choose to go down that route.

and while you're at it, stop being so defensive.  this is about a bug in
your package.  it's not about you personally.


  2. yes, i had the same initial thought, of doing a simple grep to find
  packages which stop the daemon in the prerm script.
  
  but, as is obvious from a little bit more thought about it, a simple
  grep won't tell you whether the stop command is wrapped inside an
  if/then or case statement, will it? so it tells you nothing useful.
  
 
 Sure, I quickly glanced over the packages, and only openssh seems to run
 stop only during remove/deconfigure.
 This makes 77:1

there's also postfix, which i've mentioned before.  and apache.  and
most of the following:

$ grep -l 'stop' /var/lib/dpkg/info/*.prerm | xargs grep -l 'remove'  
/var/lib/dpkg/info/acct.prerm
/var/lib/dpkg/info/apache2-mpm-itk.prerm
/var/lib/dpkg/info/autofs.prerm
/var/lib/dpkg/info/console-common.prerm
/var/lib/dpkg/info/console-tools.prerm
/var/lib/dpkg/info/cpufrequtils.prerm
/var/lib/dpkg/info/cupsys.prerm
/var/lib/dpkg/info/debtorrent.prerm
/var/lib/dpkg/info/ebtables.prerm
/var/lib/dpkg/info/gdm.prerm
/var/lib/dpkg/info/gom.prerm
/var/lib/dpkg/info/gpm.prerm
/var/lib/dpkg/info/john.prerm
/var/lib/dpkg/info/kdm.prerm
/var/lib/dpkg/info/lirc.prerm
/var/lib/dpkg/info/lm-sensors.prerm
/var/lib/dpkg/info/locate.prerm
/var/lib/dpkg/info/mgetty-fax.prerm
/var/lib/dpkg/info/mpd.prerm
/var/lib/dpkg/info/mt-st.prerm
/var/lib/dpkg/info/nfs-common.prerm
/var/lib/dpkg/info/nfs-kernel-server.prerm
/var/lib/dpkg/info/nut.prerm
/var/lib/dpkg/info/nvidia-glx.prerm
/var/lib/dpkg/info/nvi.prerm
/var/lib/dpkg/info/openssh-server.prerm
/var/lib/dpkg/info/postfix.prerm
/var/lib/dpkg/info/postgresql-8.3.prerm
/var/lib/dpkg/info/ppp.prerm
/var/lib/dpkg/info/procps.prerm
/var/lib/dpkg/info/quota.prerm
/var/lib/dpkg/info/rplay-server.prerm
/var/lib/dpkg/info/rsync.prerm
/var/lib/dpkg/info/shorewall-doc.prerm
/var/lib/dpkg/info/slashem-common.prerm
/var/lib/dpkg/info/spamassassin.prerm
/var/lib/dpkg/info/squid.prerm
/var/lib/dpkg/info/stunnel4.prerm
/var/lib/dpkg/info/tdsodbc.prerm
/var/lib/dpkg/info/twm.prerm
/var/lib/dpkg/info/udev.prerm
/var/lib/dpkg/info/uucp.prerm
/var/lib/dpkg/info/vdr.prerm
/var/lib/dpkg/info/vtun.prerm
/var/lib/dpkg/info/xcursor-themes.prerm
/var/lib/dpkg/info/xserver-xorg.prerm
/var/lib/dpkg/info/zaptel.prerm


gdm and xserver-xorg are particular obvious cases. blindly running stop
in the prerm would kill the xterm (or GUI apt-get wrapper) that the
upgrade is being run from, while the upgrade is in progress.  this
is self-evidently WRONG behaviour.

similarly, stopping rsyslog during an upgrade is also self-evidently
wrong behaviour.



ppp is another obvious case. stopping ppp during an upgrade of a remote
system could kill the connection and thus the ssh session that the
upgrade is being performed on, during the upgrade - leaving no way to
get back in to complete the upgrade. that's self-evidently wrong.

ditto for openssh-server. 

in fact, i recall deliberately installing telnetd-ssl (and configuring
it to only allow encrypted connections. and tcp-wrappers was configured
to only allow connections from known, trusted hosts) on systems until
the problem with the ssh package as it was called back then was fixed
just so that i had an alternate way to log in and complete the upgrade.

a habit i retained for many years afterwards - i only stopped routinely
installing and configuring telnetd-ssl a few years ago, and i still
install it on systems where it is vital that i be able to login remotely
in case of emergency.




 Please show me your numbers.

it's not about numbers. it's about correct behaviour. it doesn't matter
if 10,000 other packages are doing the wrong thing - it's STILL the
wrong thing, it's still a bug, and it still should be fixed.

correct behaviour is to minimise downtime during an upgrade.  corect
behaviour is to NOT lose important, irreplaceable data.




FYI, of the 122 prerm scripts on my system that contain the text 'stop',
47 of them also contain the text 'remove', implying that they have
special-case code to handle upgrade

Bug#471051: rsyslog shuts down too early during upgrade

2008-03-15 Thread Craig Sanders
On Sun, Mar 16, 2008 at 03:14:58AM +0100, Michael Biebl wrote:
 Craig Sanders wrote:
  On Sun, Mar 16, 2008 at 01:16:22AM +0100, Michael Biebl wrote:
 
  
  One of the reasons, why it is done this way, ist that you can't
  guarantee a clean shutdown of the daemon, when you have replaced it's
  files, while it is still running.
  name one daemon where that's a case.
  I already gave you one.
  
  no, you didn't. you made an assertion without any evidence, not even
  a single example.

 Again: Imo a service should be stopped by the init script which
 was written for this specific version, because the maintainer can
 only test it for this version. The alternative would be, that the
 maintainer tests the upgrade path from any previous version and adds a
 lot of special casing to the maintainer script (which *will* lead to
 errors, believe me).

dpkg passes more than enough information to the scripts for the script
to be able to make decisions about what to do.

see sections 6.5  6.6 of /usr/share/doc/debian-policy/policy.txt.gz,
from the debian-policy package.

(there are also pdf and html versions under the same directory)


a service should be stopped during upgrade ONLY if there is no other
alternative. even if that means writing a bit of code in the prerm or
postinst scripts.



btw, the only upgrade path that MUST be checked and handled as
appropriate is the upgrade from the previous stable release (incl. point
releases and security updates) to the current package version. people
who use testing or unstable understand and accept the risk of breakage
within the development cycle.

you can do better than that if you want, but that's optional.



 Second, if you replace files while the daemon is still running,
 this can lead to all sorts of subtle failures, e.g. daemons that
 dynamically load functionality via shared modules (as rsyslog does)
 might crash.

'MIGHT crash' is a whole lot better than 'definitely WILL be shut down
for the entire duration of the upgrade - many minutes or even hours(*)'.

with the former you have a chance of significant downtime during
upgrade.

with the latter, you are guaranteeing significant downtime during
upgrade.


(*) possibly days or longer in the case of remote upgrades of packages
like ppp or sshd, if they were to shut down during the upgrade. depends
on how far away the remote machine is and how long it takes to get
someone there to fix it at the console.

craig

-- 
craig sanders [EMAIL PROTECTED]



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#468482: dlocate uninstallable in unstable

2008-02-29 Thread Craig Sanders
On Fri, Feb 29, 2008 at 11:50:19AM +0100, Hilmar Preusse wrote:
 dlocate actually depends on findutils ( 4.2.31-2).

no, it depends on 'locate | findutils ( 4.2.31-2)'

 [...]
 Depends: dctrl-tools | grep-dctrl (= 0.11), dpkg (= 1.8.0), locate | 
 findutils ( 4.2.31-2), perl
 [...]
 
 The version in unstable/testing however is 4.2.33-1 resp. 4.2.32-1, which
 makes your package uninstallable. No, findutils can't be removed, as it is
 essential.

could you show an example?

dlocate depends on either 'locate' or 'findutils ( 4.2.31-2)'. that
was changed recently (Dec 2007) because locate  frcode were split out
from the findutils package into a separate package called 'locate'.

to put it another way: dlocate actually depends on the frcode program.
frcode used to be in findutils package ( 4.2.31-2), but is now in the
locate package.


so, either the 'locate' package or 'findutils 4.2.31-2' will satisfy
the dependancy.

see Bug #453952 for more details.


btw, what are you using? apt-get? aptitude? if you're describing an
actual event rather than just theory, try with the other one. if that
works, then i'd suspect that there might be a bug in the dependancy
resolution of the other.




BTW, in case you're worried about incompatibility with mlocate, that
package does not conflict with the locate packagealthough you do
have to take steps to stop both locate and mlocate from indexing the
filesystems. see Bug #454471 for more details. see also Bug #454106, one
of the messages contains a useful method for generating the locate db
from the mlocate db.




 P.S.: there are a lot of bugs on your package having the tag fixed, but
 were never closed.

hmmm. i guess i've just assumed all along that Fixed is equivalent to
Closed, and they don't actually need to be closed. i've adopted all the
NMU changes, so i'll just close all the Fixed bugs next time i upload an
update.


craig

-- 
craig sanders [EMAIL PROTECTED]



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#443674: two more data points

2007-10-03 Thread Craig Sanders
and two more data points i forgot to mention in my last message.

i have two other machines here, also with nvidia cards.

1.  hex is an ancient amd-2000 with an nv geforce 6200.

01:00.0 VGA compatible controller: nVidia Corporation NV44A [GeForce 6200] (rev 
a1)

it has been running without error  without reboot for the last week
on kernel 2.6.22.6, with nvidia-kernel driver 100.14.11 and nvidia-glx
100.14.11

i decided to reboot it today with kernel 2.6.22.9 and nvidia 100.14.19.  it
has been working fine for about 8.5 hours so far.



2. kali is an amd64 with an nv geforce fx 5200

01:00.0 VGA compatible controller: nVidia Corporation NV34 [GeForce FX 5200] 
(rev a1)

kali never got upgraded to 2.6.22.6  nv 100.14.11 because of the problems
i had with my main machine.

however, i installed a new kernel and rebooted it earlier today with
2.6.20.20 and nvidia 100.14.19. no problems at all (just over 9 hours).




so far, the common factor in crashing machines seems to be the nvidia
7300 GS chipset combined with nvidia drivers 100.14.11 or 100.14.19.



craig

-- 
craig sanders [EMAIL PROTECTED]

The fundamentalists deny that evolution has taken place; they deny that
 the earth and the universe as a whole are more than a few thousand years
 old, and so on. There is ample scientific evidence that the fundamentalists
 are wrong in these matters, and that their notions of cosmogony have about
 as much basis in fact as the Tooth Fairy has.
  [Isaac Asimov, quoted in 2000 Years of Disbelief,
   Famous People with the Courage to Doubt, by
   James A. Haught, Prometheus Books, 1996]



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#443674: same problem here

2007-10-03 Thread Craig Sanders
another data point re: nvidia 100.14.11 and 100.14.19 drivers


i'm not sure if this is an nvidia-glx bug, an nvidia-kernel-* bug, or a
kernel bug. it certainly shouldn't be classed as a browser bug because
applications shouldn't be able to lock the system solid.

my very strong suspicion is that it's actually the nvidia-kernel-*
driver which is at fault. see below for why.

i also had the same problem with version 100.14.11

like Hans, my entire system locks up after moderate use of the web
browser (either mozilla or iceweasel - haven't tried others). scrolling
a page back and forth, for example, will trigger it very quickly.



here's why i think it's the kernel driver:

i have been running kernel 2.6.20.14 with nvidia-kernel 100.14.09 for 2+
months without problems, without rebooting.

about a week ago, i had a power failure and took the opportunity to
reboot into a new 2.6.22.6 kernel. i compiled nvidia-kernel 100.14.11 to
go with it, and upgraded nvidia-glx at the same time.

the machine locked solid while using iceweasel within an hour of
booting. rebooted again. it locked again while using iceweasel, also in
less than an hour.

i rebooted again and selected the old kernel (2.6.20.14) from the
grub menu. the machine ran fine until today (when a hard disk problem
required me to reboot).

thinking the problem might have been with the 2.6.22.x kernel, i tried
compiling 2.6.20.20 and nvidia-kernel driver 100.14.19, and again
upgraded nvidia-glx to suit. again, the machine locked while using a
browser (iceape, this time).

i rebooted back to 2.6.20.14 and the machine has been running fine
since...nearly 5 hours now.

note: reverting to the old nvidia-kernel driver (100.14.09) while
keeping the newer nvidia-glx package causes warnings about mismatched
version to appear in /var/log/kern.log but everything works fine. that
was how i ran it for the last week until today, and it's how i've been
running it for the last 5 hours.

Oct  3 18:48:42 ganesh kernel: [  200.629000] NVRM: API mismatch: the client 
has the version 100.14.19, but
Oct  3 18:48:42 ganesh kernel: [  200.629000] NVRM: this kernel module has the 
version 100.14.09.  Please
Oct  3 18:48:42 ganesh kernel: [  200.629000] NVRM: make sure that this kernel 
module and all NVIDIA driver
Oct  3 18:48:42 ganesh kernel: [  200.629000] NVRM: components have the same 
version.



fortunately, the latest nvidia-glx package's dependancy on the matching
nvidia-kernel package version only requires that nvidia-kernel-* package
to be installed - it doesn't force you to use that nvidia.ko kernel
module.


so, Hans, there's another workaround for you. use an older kernel and
the 100.14.09 nvidia-kernel driver, with the new nvidia-glx package.
works fine.




note: in all cases, i used stock linux kernel source from kernel.org and
NOT the debianised kernel sources. i don't and won't use them.

craig

ps: my video hardware is also a 7300GS, and my system is an athlon64 am2
X2 (amd64 CPU but running 32-bit i386 debian as this machine has been
upgraded for years and there isn't an upgrade path to 64-bit debian)
with 4GB RAM.


# lspci -v -s 02:00.0
02:00.0 VGA compatible controller: nVidia Corporation GeForce 7300 GS (rev a1) 
(prog-if 00 [VGA])
Subsystem: ASUSTeK Computer Inc. Unknown device 81f3
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at fa00 (32-bit, non-prefetchable) [size=16M]
Memory at e000 (64-bit, prefetchable) [size=256M]
Memory at fb00 (64-bit, non-prefetchable) [size=16M]
[virtual] Expansion ROM at fcfe [disabled] [size=128K]
Capabilities: [60] Power Management version 2
Capabilities: [68] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 
Enable-
Capabilities: [78] Express Endpoint IRQ 0
Capabilities: [100] Virtual Channel
Capabilities: [128] Power Budgeting

# uname -a
Linux ganesh.taz.net.au 2.6.20.14 #1 SMP PREEMPT Sun Jul 1 10:09:39 EST 2007 
i686 GNU/Linux

# cat /proc/version 
Linux version 2.6.20.14 ([EMAIL PROTECTED]) (gcc version 4.1.3 20070601 
(prerelease) (Debian 4.1.2-12)) #1 SMP PREEMPT Sun Jul 1 10:09:39 EST 2007


-- 
craig sanders [EMAIL PROTECTED]

BOFH excuse #177:

sticktion



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]