Re: [RFD] Configuration spec: "rc", "env" and "hostname"

Martin Lucina Mon, 01 Feb 2016 03:22:28 -0800

On Friday, 29.01.2016 at 11:32, Antti Kantee wrote:
> [quoting a number of mails.  hopefully it's parseable]

No problem for me, and easier than looking up replies in multiple
sub-threads. Thanks!

> 
> On 26/01/16 13:25, Martin Lucina wrote:
> >## rc: Program invocation
> 
> Is this block intended to work in kernonly mode, or with userspace only?

It *should* work in kernonly mode, with the exception that if we end up
specifying a per-process "env" block for each "rc" entry, then that will
have to be emulated somehow.

Regarding kernonly mode in general, I've not really thought through what
facilities (including a config parser) should or should not be provided.
I'd be interested in knowing more about your plans and thoughts on this
(possibly for a different subthread?)

> >     "rc": [
> >          {
> >              "bin": <string>,
> >              "argv": [ <string>, ... ],
> >              "runmode": "" | "&" | "|"
> >          },
> >          ...
> >     ]
> 
> One thing that Vincent brought up was rlimits.  Should there be a sysctl
> block in here, or should we call it rlimit?  pwd that David mentioned is
> similar.  Are there any other per-process things?  We don't have to think of
> all of them this moment, but the more we think of them, the more confident
> we can be in things being extensible in the future.

Unless I'm missing some NetBSD-specific uses of sysctl, sysctl is global,
am I right?  Therefore I wouldn't expect users to need a per-"rc" sysctl
block.

For rlimits, Off the top of my head, we could do something like this:

"rc": [
  {
    "rlimit": {
      "core": ...,
      "cpu": ...,
      "data": ...,
      "nofile": ...,
     }
  }
]

(The keys in "rlimit" would correspond to RLIMIT_XXX from setrlimit().)

> >* _argv[]_: Argument list passed to program. At a minimum, a single string
> >   is required, which will be passed to the program as _argv[0]_.
> 
> Would it make sense to default argv[] to binname if argv is not present?  I
> can see both pros and cons.  How about if it's implemented but not
> documented?  (not that we need to worry about undocumented things in this
> thread, but the question is more if we should document it)

What's the motivation for this? A "developer" shortcut to save typing?
There are already enough alternative behaviours in rc[] (e.g. when none is
passed at all), that I'd prefer not adding more. Having said that, argv[] =
[binname] is consitent with precisely the "no rc" case, so I have no strong
opinion either way.

Regarding undocumented behaviour, I've deliberately written the following
text about undocumented behaviours into the spec:

    Configuration interfaces and/or behaviours not documented here are
    considered unofficial and experimental, and may be removed without
    warning.

Therefore, in the context of this discussion, if we implement it then it
should be documented.

> On 27/01/16 10:41, Martin Lucina wrote:
> >Rumprun-bake could do that, I just hadn't done it in my initial
> >implementation since implementing "is this set of names unique" in bash is
> >a PITA.
> 
> The following should work:
> ubins="${bins}"
> _uniq ubins
> [ "${bins}" = "${ubins}" ] || die not unique
> 
> (I guess _uniq could have a smarter calling convention to make that a
> one-liner instead of a 3-liner PITA)

Thanks for that, will do.

> On 28/01/16 10:17, Martin Lucina wrote:
> >"Working directory" is very much a setting, so yeah. However, we need to
> >account for a default otherwise a previous workdir would affect the next
> >invocation(s), so I propose:
> >
> >"workdir": <string> (optional, default: "/")
> >
> >This means that there's an implicit chdir("/") done before each invocation,
> >unless you specify a workdir.
> 
> Well, first of all, if you implement that approach sanely, you don't need an
> "implicit chdir".

Good point, ack.

> However, is that approach really the most natural way of thinking about
> working directories?  Why not make it behave like "cd" on a shell?

Not sure what you mean by that, can you elaborate?

> On 26/01/16 13:25, Martin Lucina wrote:
> >## env: Environment variables
> >
> >Open issues:
> >
> >- Justin Cormack mentioned that the "env" key should be made "per-process",
> >   i.e. each "rc" entry should have it's own "env" subkey.
> >
> >   I don't this this is possible or desirable as:
> >
> >   a) The environment is a libc construct, so it's not applicable to e.g.
> >   kernonly mode.
> 
> If that's an argument, why have the env block at all?

Because it's useful for a known case with many examples (existing
unmodified software which accepts e.g. verbosity / debug / other toggles
via the environment).

> >   b) An "rc" invocation is not a "process" and as such does not have it's
> >   own address space. The environment is a global construct in libc, thus
> >   cannot be made "per-process" in rumprun without patching libc which we
> >   want to minimize.
> 
> I don't think you definition of "process" matches mine.
> 
> As long the "global" environ *symbol* is reachable only from one process, it
> should work just fine.  Don't need a patched libc, just need a libc per
> process.  That's a good idea anyway for other globals like stdio.  I think
> it *should* work, but it's a different story what sort of can of worms it
> might be.  Only one way to find out.  For extra credit, shared text/rodata,
> which should also be possible without much lost hair.

Reading the libc implementation (stdlib/_env.c), it's not just environ but
also all the other static symbols in that module. So, at the very least
we'd have to patch/provide a replacement _env.c to use __thread for its
static data, make sure that the right thing gets done whenever a new
process context is initialised *and* figure out how to do all that in a way
that's acceptable to upstream.

That seems like a bunch of work solving a problem I don't see an
immediate need for?

> >## hostname: Kernel hostname
> >
> >     "hostname": <string>
> >
> >* _hostname_: Sets the hostname returned by the `gethostname()` call.
> >
> >- Nothing controversial here ...
> 
> Does that need its own top-level entity?  If we're adding a sysctl block,
> you can just set kern.hostname there.
> 
> The only advantage I see is that if we ever get a non-NetBSD backend (which
> I hope we do, though not holding by breath just yet), the above
> representation is portable.  However, I'm not convinced that the fs/net
> blocks will be portable, so maybe it's not a goal worth shooting for.

I'd keep hostname in the top-level for extra visibility, rather than
"hiding" it in a sysctl tree.

Don't know about portability to other backends -- again, relevant also to
Mirage/Rump hybrid unikernels and kernonly, need to think about it more. A
left-handed attempt for a portable sysctl block would be to put it under a
"netbsd" top-level block.

> General: are you updating the spec somewhere as the discussion progresses?
> Since I took a few days, I'm not sure I commented on the current proposal.

Yes, both the spec and the corresponding implementation are being kept up
to date on the "mato-wip-rumprun-config" branch. Direct link to the spec
here:

https://github.com/rumpkernel/rumprun/blob/mato-wip-rumprun-config/doc/config.md

Re: [RFD] Configuration spec: "rc", "env" and "hostname"

Reply via email to