particularly @Scott, @Steve

Since I've now hit this bug again, I had another read over the bug
thread...

Here are some thoughts... which may be substantially wrong, but hey.


There feels like a disconnect here between the true preconditions for some 
jobs, and the kind of preconditions specifyable for upstart jobs in general.

A fundamental problem, if I understand the situation correctly, is that
we have cases where the events (things happening dynamically during
boot) are not adequate to determine whether/when upstart should consider
a job startable, at least at the level of the simple boolean
combinations etc. that upstart currently understands.

The startability of some jobs depends on other factors (in this case,
static system configuration which the administrator expects to customise
--- the fstab).  If upstart is conservative and waits until _everything_
is mounted, we will fail in some cases, for example when there are NFS
mounts in fstab.  Alternatively, if upstart is aggressive and tries to
start the statd job as soon as it is _probably_ startable, then it might
fail to start, and there's not much we can do about it -- that seems to
be the current behaviour.

This is a problem because upstart doesn't currently have any sensible
metholodogy for retrying failed jobs.  So, we either need a way to retry
jobs at sensible times, or we would need a more expressive way to
determine when jobs should be started.

Conservative approach
==================
The "conservative" approach would be this approximation (which seems work for 
me):

    start on (filesystem and (started portmap or mounting TYPE=nfs))

...because "filesystem" really does mean the whole FHS tree has been
mounted, and that the contents of /var are real (not just a stub
mountpoint).  This won't work for anyone who uses NFS for a mountpoint
within the FHS (even if it's not /var and not otherwise needed for
launching statd) and probably won't work if an NFS filesystem is listed
in /etc/fstab (?) - but shouldn't cause extra problems when using
nfsroot since the kernel's internal statd is used in that case (I
think?)

Better approach?
==============
Ideally, we could write something like:

    start on (mounted-final MOUNTPOINT=/var) and (started portmap or
mounting TYPE=nfs)

Where "mounted-final MOUNTPOINT=<path>" means that all necessary mounts
have been done to populate <path> with its "real" FHS contents, and the
boot process won't mount anything else on top.

This could be implemented in a practical way in mountall if we don't
attempt to make it universal--- i.e., we don't ensure that it works for
every possible <path>, but we do make it work for top-level directories
defined by the FHS.  To emit these events, mountall's must parse the
whole fstab and then act appropriately on each mount:

  * When <path> is mounted:
      * emit mounted MOUNTPOINT=<path>
      * for d in {each FHS top-level dir}:
            if no explicit mount for d or a parent of d in fstab:
                emit mounted MOUNTPOINT=<d>

General approach
==============

The above feels a bit messy and fragile, and doesn't solve the general
problem of configuration-dependent job start preconditions. So, it might
be better to implement outside mountall, by extending upstart with some
extra flexibility for job start conditions.  For example:

    start on $eval(mounted-final /var) and (started portmap or mounting
TYPE=nfs)

...where $eval(<command> <arguments>) is some magic new upstart event
expression syntax which runs an arbitrary command or script and uses its
output as part of the event expression.  [I'm not suggesting exactly
that syntax of course -- I admit it's pretty hideous ;P]

In our case, "mounted-final <path>" is some widget which returns the
event expression "mounted MOUNTPOINT=<x>", where <x> is the deepest path
listed in fstab that is a parent of, or is equal to, <path>.  This is
pretty easy to script up.  So it really depends on whether upstart
can/should be extended to support this kind of thing.

"Retry" approach
==============
Finally, it might be interesting to consider whether it makes sense to define 
specific retry conditions for jobs.  This makes allows us to do better at 
retries than dumb polling.  I don't remember exactly the pattern-matching 
capabilities for event key values, but I can imagine something like:

retry on (mounted MOUNTPOINT=/var) or (mounted MOUNTPOINT=/var/*)
retry on (mounted MOUNTPOINT=/var(/.*)?)  # if regex is supported?

It still feels a bit wrong though... statd may spuriously succeed to
start if the rootfs contains /var/lib/nfs, but the real /var is
subsquently mounted on top of it (maybe after statd was started).

If a retry feature is added, it would be wise to limit the maximum
number of retries (as for respawn) or the maximum time period over which
retries will be attempted.


Thoughts?


** Patch added: "my conservative workaround"
   
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/525154/+attachment/1578208/+files/statd.diff

-- 
mountall for /var races with rpc.statd
https://bugs.launchpad.net/bugs/525154
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to