Bug#904558: What should happen when maintscripts fail to restart a service

2018-09-18 Thread Gunnar Wolf
Stuart Prescott dijo [Wed, Sep 19, 2018 at 12:18:24PM +1000]:
> (...)
> That was perhaps also written before we started to realise that maintainer 
> scripts are actually best avoided as they tend to be complicated, fragile, 
> difficult to do right and make upgrades harder for the package manager. In 
> the intervening two decades, we've gone from "maintainer scripts are cool" 
> to "the best maintainer script is the one that doesn't exist".
> 
> So yes, ignoring errors seems wrong but…
> (...)
> … causing a snowball of errors in an awkward half-upgraded environment is 
> nasty.
> 
> The problem comes when you don't yet have the right tools installed to be 
> able to fix the problem. We see that scenario often enough in #debian where 
> someone has a failed upgrade and we try to collect more information via 
> pastebinit, strace, traceroute, netcat, gdb, etc; we frequently discover 
> that the relevant tool isn't installed and because apt is sufficiently 
> unhappy about broken packages and a half-completed upgrade, you can't ask it 
> to install the tool at that point in time.
> 
> In the upgrade scenario, while you're trying to fix one particular problem, 
> you're also in a completely untested half-upgraded situation and so latent 
> bugs in any number of other tools may also be exposed.
> 
> So while ignoring errors is wrong, so is making it harder to fix them. This 
> isn't a question of absolutes.

I completely agree with Stuart here. Yes, of course, there is a reason
for maintainer scripts to exist, and if they fail to set up things
around the package, of course, the user _needs_ to know something is
off in their system.

But that should happen _very_ seldom. As Stuart says, helping
non-technical users out of this situation can be quite hard, and quite
discouraging for the user. We have to make sure the scripts are as
foolproof as possible — and failing to stop or restart a daemon it
should _never_ cause the system to enter such a state.


signature.asc
Description: PGP signature


Bug#904558: What should happen when maintscripts fail to restart a service

2018-09-18 Thread Stuart Prescott
Ian Jackson wrote:
>> I personally think that it would make sense for the policy to at least
>> recommend what should happen with regards to maintainer scripts and
>> typical operations that are performed in them.
> 
> There is already a section on error handling in scripts, which (IMO
> correctly) says that shell scripts should use set -e.
> 
> When I wrote that, it didn't occur to me that anyone would think that
> a failure by a postinst script to perform an intended operation should
> be treated any other way than a failure of the postinst script.

That was perhaps also written before we started to realise that maintainer 
scripts are actually best avoided as they tend to be complicated, fragile, 
difficult to do right and make upgrades harder for the package manager. In 
the intervening two decades, we've gone from "maintainer scripts are cool" 
to "the best maintainer script is the one that doesn't exist".

So yes, ignoring errors seems wrong but…


>> And, while I'm open to be convinced otherwise, I don't see any benefit
>> from postinst (particularly postinst + configure) ever failing.
> 
> Frankly I'm disturbed to be reading this, here.  See above.
> 
> If the postinst fails, then the user has the opportunity to fix the
> root cause and rerun dpkg-source --configure --pending.  That will
> then repair the system completely.

… causing a snowball of errors in an awkward half-upgraded environment is 
nasty.

The problem comes when you don't yet have the right tools installed to be 
able to fix the problem. We see that scenario often enough in #debian where 
someone has a failed upgrade and we try to collect more information via 
pastebinit, strace, traceroute, netcat, gdb, etc; we frequently discover 
that the relevant tool isn't installed and because apt is sufficiently 
unhappy about broken packages and a half-completed upgrade, you can't ask it 
to install the tool at that point in time.

In the upgrade scenario, while you're trying to fix one particular problem, 
you're also in a completely untested half-upgraded situation and so latent 
bugs in any number of other tools may also be exposed.

So while ignoring errors is wrong, so is making it harder to fix them. This 
isn't a question of absolutes.

cheers
Stuart

-- 
Stuart Prescotthttp://www.nanonanonano.net/   stu...@nanonanonano.net
Debian Developer   http://www.debian.org/ stu...@debian.org
GPG fingerprint90E2 D2C1 AD14 6A1B 7EBB 891D BBC1 7EBB 1396 F2F7



Bug#904558: What should happen when maintscripts fail to restart a service

2018-09-18 Thread Tollef Fog Heen
]] Ian Jackson 

Hi,

> There may be good reasons not to treat daemon startup failure as a
> postinst failure, but the argument above is not one of them.

I think this is the core question.  I largely agree with Ian here that
having postinsts fail is not that big a deal if they can't make forward
progress, but also we're being asked to advice on what happens when a
maintainer script fails to restart a service.  I disagree with him on
whether failure to start/restart a service should be considered a
configuration failure.

The API provided by a package being in the configured state is not
whether the relevant daemon is running or not; that is runtime and can
and will change many times while the package is in the configured state,
so dpkg dependencies are not useful for expressing «this service must be
running».  (There's also the case where the service is running on a
separate host, which is often the case for services such as databases
and where the use of Depends is inappropriate.)

I think the general rule should be that the success/failure of the
postinst script should signal whether the package considers itself ready
to provide whatever API it exists to provide (disregarding the case of
Essential packages here, since those are special).

This means that failure to start a daemon should generally not cause the
postinst to fail.  At the same time, I think there are exceptions to
this rule that should be left to maintainer judgement: sshd comes to
mind as a service where if it can't restart, you want the system to make
it very clear that something is wrong that you might want to fix sooner
rather than later (since failure to do so can lead to you not being able
to access it after a reboot).

-- 
Tollef Fog Heen
UNIX is user friendly, it's just picky about who its friends are



Bug#904302: Whether vendor-specific patch series should be permitted in the archive

2018-09-18 Thread Tollef Fog Heen
]] Philip Hands 

> Tollef Fog Heen  writes:
> 
> >This should be implemented in Debian Policy by declaring that a a
>^^^
> You've this doubled 'a' on two occasions in this text.

I'll fix that, thanks for spotting it.

> Presumaly we would not want to see new packages adopting the use of
> vendor-specific patch series prior to Buster.
> 
> Do we need to make the "SHOULD NOT" conditional on the package already
> having a vendor-specific patch series at the time of this resolution?

I think that just adds needless complexity and assumes that maintainers
will want to add bugs to their package.  I really hope that's not the
case, so I don't think it's worthwhile to add extra language for it.

-- 
Tollef Fog Heen
UNIX is user friendly, it's just picky about who its friends are



Bug#904558: What should happen when maintscripts fail to restart a service

2018-09-18 Thread Ian Jackson
Margarita Manterola writes ("Bug#904558: What should happen when maintscripts 
fail to restart  a service"):
> Sorry that it took so long to get back to this bug.  The other bug took
> all the attention.
...
> If a postinst fails (for whatever reason), the package is left in a
> broken state (Failed-Config) which in general makes the package
> management system unhappy.

The other effect is that the package's dependencies are not
configured, so their postinsts do not experience a broken situation.

> It seems that the only reason why one may want to do this is to call
> the attention of the sysadmin so that they can solve the problem.
> However, in a world where a large number of users are running automatic
> updates, leaving the package management system in a broken state is
> pretty sad, not very visible and rather confusing for the user when
> they finally encounter it.
> 
> Is there an another use case for leaving the package in Failed-Config
> that we missed?

If you deliberately cause the postinst to succeed when the package is
nonfunctional, then the package's r-dependencies will be configured
(ie have their postinsts run) in the broken state.

The r-dependencies' postinsts may then do wrong things.  They may
leave the r-dependencies in anomalous states.  If one takes the
argument you make above to its logical conclusion, all those postinsts
should also report success.

The result is system where the only thing that is happy is the package
management systme, and the records of the root cause of the problem,
and how the failed operations might be reattempted, have been lost.

I guess you will infer from what I write above that "reporting errors
causes the next layer to be unhappy", and "reporting errors causes the
user to be unhappy" to be extraordinarily bad arguments.

There may be good reasons not to treat daemon startup failure as a
postinst failure, but the argument above is not one of them.

> It's unclear why the service (re)start needs to be a special case.

Service (re)starts are more likely to fail for unrelated reasons.
Also some packages are able to provide much of their intended API even
without the daemon.

I think the general rule of thumb should be that a daemon startup
failure should be treated as a configuration failure.

I'm content with a situation where maintainers Feel free to diverge
from this if there are reasons to do so.

> I personally think that it would make sense for the policy to at least
> recommend what should happen with regards to maintainer scripts and
> typical operations that are performed in them.

There is already a section on error handling in scripts, which (IMO
correctly) says that shell scripts should use set -e.

When I wrote that, it didn't occur to me that anyone would think that
a failure by a postinst script to perform an intended operation should
be treated any other way than a failure of the postinst script.

(In the usual case.  There are of course lots of situations where the
right approach is some kind of error recovery, or the operation was
attempted "just in case", or something, in which case more subtle
error handling is called for.)

> And, while I'm open to be convinced otherwise, I don't see any benefit
> from postinst (particularly postinst + configure) ever failing.

Frankly I'm disturbed to be reading this, here.  See above.

If the postinst fails, then the user has the opportunity to fix the
root cause and rerun dpkg-source --configure --pending.  That will
then repair the system completely.

Ian.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Next Meeting - Wednesday, September 19th 19:00 UTC (tomorrow)

2018-09-18 Thread Margarita Manterola

Dear Technical Committee members,

Our monthly meeting will take place tomorrow at 19:00 UTC.

These are the items in the agenda (also committed to git):
 * Review of previous meeting AIs
 * #904302 Whether vendor-specific patch series should be permitted in 
the archive
 * #904558 What should happen when maintscripts fail to restart a 
service

 * Additional Business

See you there!

--
Regards,
Marga