Re: [pkg-discuss] initial linked images design doc

Shawn Walker Mon, 24 May 2010 13:41:01 -0700

On 03/ 7/10 12:23 PM, Edward Pilatowicz wrote:

hey all,


so in an effort to get pkg image-update to work with zones, i've been
prototyping support for linked images within pkg(5). i don't have a fully
functional prototype to pass around yet, but i have documented some
aspects of my prototype in an initial design doc. i've attached that doc
so that people can take a look, see what i'm thinking, provide feed
back, etc. all comments and criticism are welcome.


lines 10-34:
    So are linked image types just *additional* types of images (e.g.
    full, partial, etc.) or is this a subtype of an existing image type?

lines 164-205: User linked images:
    I'd like to make the observation that it seems like user and zone
    linked images fall into two high level models:

    - push type

    This is roughly what you describe zone images as; a push-only model
    where the parent tells any child images what the new state of the
    parent image is after operations are completed.

    - pull type

    This is what you describe user images as currently; a pull-only
    model where the user or some service is responsible for detecting
    changes in a parent image and syncing the child user image.

    I'd suggest that user images could be both push and pull.  That is,
    an administrator may find it useful to create a special set of
    user images for creating isolated software stacks of specific
    packages that can't normally co-exist.  This is often seen with
    software installed in /opt as an example.

lines 253ff:
    Indeed, instead of "system publisher", I'd prefer to see the text
    "system repository" or "parent image repository" since that's what
    the functionality really is.

lines 270-288:
    I'm assuming that this complements / assumes implementation of
    bug 15343?

line 298-299:
    s/do not do any do any check/do not perform any checks/

line 299:
    s/installed installed/installed/

lines 320-330:
    It seems like it would be helpful if beadm permitted linking any
    arbitrary ZFS filesystem to a given BE to enable this functionality.
    In particular, I was thinking that it would be useful for
    administrators to choose whether some filesystems remained
    independent or stayed in sync with a given boot environment.
    This could be particularly useful in the linked user image
    management of isolated software stacks I mentioned above or
    for the management of configuration data.

    I agree that this enhancement is out of scope for this proposal,
    but wanted to mention it.

lines 333-340:
    I could be mistaken, but I seem to recall from my past experience
    with translation and localisation that the terms parent/child
    were strongly preferred in general end-user documentation over
    master/slave.  I'd ask someone in l10n about this and then only
    use the terms they recommend.

line 365:
    s/access the/access to the/

line 375:
    s/mange/manage/

lines 375-384:
    When you say "constraints data", you really mean "image state",
    right?  Or you imply "constraints data" in the more generic sense
    (e.g. package versions, package state, package constraints)?

lines 386-391:
    I think /var/pkg/state/export /var/pkg/state/import or
    /var/pkg/cache/state/export, /var/pkg/cache/state/import
    would fit better with the on-disk format proposal.  I don't
    understand why 'master' is part of the pathname here since
    there's no corresponding linked/child/* set of directories
    in this proposal either.

lines 393-399:
    This bit is really confusing to me:

    - What does the linked image type and content policy stored here
      indicate?

    - How does that meaning change based on whether it is in the export
      or import directory?

    - What form is this information stored in and when does it get
      stored?

    - It seems like images that you *push* changes to should simply get
      their data from a temporary directory that contains the exported
      information for the duration of the operation.  I'm not sure this
      should be stored in $IMGROOT/pkg.  For images that *pull* changes
      from the master image, realistically this information needs to be
      generated on demand instead of being stored in $IMGROOT/pkg.  If
      the purpose of this directory is to simply cache information that
      was imported or is being used to perform a sync operation, then
      I'd say it should live under /var/pkg/cache/linked (to fit with
      the on-disk proposal).  Does this data need to be kept longer
      than that?

    - I think that if you're exporting "constraint information" that it
      really sounds like you should just be generating an incorporation
      package since that's equivalent and fits with existing
      functionality.  In particular,  I would anticipate pkg freeze
      to be based on incorporations so that would fit better with what
      you've proposed here.

    - If you're going to export state information for the parent image,
      then that's what's contained in the /var/pkg/state/installed
      directory.  Specifically, just catalog.attrs and catalog.base.C.

lines 401-407:
    If we're going to go this route, I'd like to see a documented
    serialisation format for the imageplan that can be used more
    generically.  It should also be JSON; not pickle-based.  You
    need a more general format anyway to fit with some of the
    functionality described later on in this proposal for --runid.

lines 421-427:
    Since a linked image can realistically only account for the last
    sync'd state of a parent image, it seems like the parent image
    could simply provide constraints as a dynamically generated
    incorporation package (manifest) as part of the export/import
    process.  That manifest could then be stored locally and treated
    exactly like a package normally would be without any special logic.
    Doing so also makes it possible for the normal memory management
    that the client api uses internally to not have to have special
    logic to marshal this information to disk (possibly repeatedly).

lines 429-436:
    This could be greatly simplified by simply stating that the special
    system packages will not have a publisher.  That is simpler and
    works better than having a special string constant value for the
    publisher.  That also fits nicely with the transport framework
    since a publisher is required to perform transport operations, so
    simply checking for "if pfmri.publisher" is faster and we avoid
    the memory usage for the publisher string and parsing.  Every FMRI
    normally has a publisher, so you can pretty much be guaranteed that
    in any case where you'd care, you can rely on simply checking to see
    if an FMRI has a publisher.

line 442:
    I wonder if one of the existing reserved namespaces that were
    proposed in May 2008 [1] could be used instead?  In particular,
    feature, cluster, metacluster, or service?  At the least, I'd
    like to see the name be a bit more specific than
    "linkedimage/constraints"; perhaps "parent-image-constraints"?
    I'm uncertain if the site/ namespace could be used here as that
    would seem to fit nicely with the purpose of this package.

lines 456-465:
    Why would the first package be empty instead of just making this an
    update from the out of sync version to the in-sync version of this
    package?  Or alternatively, an uninstall of the old one and an
    install of the new one?

line 481-491:
    This seems difficult and fragile for several reasons:

    - no guarantee that the operation in progress for the parent will
      complete successfully

    - does this distinguish between an operation being *planned* in
      the parent and one that is intended to be executed?

    - this implies that a parent image would always have to export
      (marshal?) its plan data during operations so that images that
      are linked to the parent (that the parent doesn't know about)
      can use this information

lines 526-543:
    Why are "in-core" packages defined using a package attribute
    instead of an incorporation?  In particular, forcing this to
    be a property doesn't seem very flexible since the definition
    of what an in-core package is could be drastically different
    depending on the use case.  Is the intent that this property
    is only used in special dynamically generated packages?

    I think I'd rather see a list of incorporations that were used
    to define the sync policy.  That would also allow pkg freeze
    policies in the parent image to be applied to the child image
    more easily.

    It also seems like the property values could be a bit more
    user friendly.  I'd suggest:

      sync-cip -> minimum

      sync-all -> exact

      superset -> possible


lines 565-570:
    independent-minimal:

    - This is a bit confusing to me as it seems to overlap with the
      li-content-policy property above.  I see that you referenced
      that, but what I don't understand is why this wouldn't *always*
      be the case.  Specifically, what's the real difference between
      this value and "independent"?  I would think that the linked
      child image's content-policy would have to always be honoured.

lines 572-579:
    Will administrators be able to leave images "linked" but simply put
    them into a "disabled" state of some sort?  I can definitely see
    administrators wanting to temporarily defer updating a linked image
    because of problems without preventing update of the parent image.
    I don't think simply unlinking it is the right solution as that
    means you lose the location information (which is valuable).  This
    is not for zones obviously as you can simply detach those, which I
    assume simply puts them into an equivalent offline state or they're
    automatically skipped even though they remain linked.

lines 589-595:
    While this list fits the current definition and model of client
    operation execution, this will be changing in the near future.  In
    particular, there are may be multiple data retrieval phases which
    means that you can't rely on this model to determine what can and
    cannot be done during different parts of an operation.  More on
    that below in my comments for 610-636.

    As I mentioned above, I'd like to see a documented format for the
    plan serialization, and this may also be a good time to somewhat
    adjust how the client prepares and executes a plan to simplify the
    processes involved.  Since what you've proposed here is basically
    marshalling an imageplan at different phases, I think this needs
    to be more generic so that we can use this functionality for low
    or restricted resource environments as well (which zones may be).

line 626: s/equilivant/equivalent/

lines 610-636:
    Instead of making these project private interfaces and to make them
    more generally useful, I'd like to see them become a bit more
    generic with the hope that they'd eventually be suitable for
    end-users.  In particular, we already have a few RFEs open for
    being able to only perform the download portion of an operation,
    etc.  With that in mind, I'd suggest the following changes:

    runid -> --plan
         Path to a pkg(5) plan file.

    plan -> --stage
        Specifying --stage without --runid

    Another question is where the plan gets stored when you run it
    with the --stage option?  --runid seems too magical since it
    requires that the plan be stored within $IMGROOT, and I'd like
    for the plan to be retrievable from anywhere.  If that's not
    desirable, can you expound on why?

    Finally, I'd also like to see the stages changed a bit to be more
    general in light of my earlier comments about multiple download
    phases and to be a bit more user friendly:

        default
            I'm assuming this implies that if --plan-id *is* specified,
            then the client will resume from the last point in the
            plan and this just continue until completion of the
            operation.

        pkgs -> evaluate
            Just evaluates the operation (may trigger metadata
            retrieval in the v0 repository case).  This is enough
            to determine what will be installed but not the size of
            the operation (disk space) or how many actions will be
            involved.

        actions -> prepare
            Prepares for package content retrieval and operation
            execution.  This will retrieve package metadata (manifests).

        download
            Retrieve package content required to execute operation and
            exit.

        execute
            Execute the operation.

lines 643-784:
    I think verb oriented subcommands are easier to remember and type;
    that also fits with our existing subcommand naming pattern.

    Also, why require the -l option for specifying the identifier of
    linked images?  Our other subcommands simply accept positional
    operands instead.  Obviously you need to use an option for image-
    create, but the other subcommands don't really need one.

    So with the above in mind:

    linked-list -> list-linked

    linked-sync -> sync-linked

    linked-unlink -> unlink-image
        No corresponding "link-image" subcommand?  I'm aware that
        you account for it at image-create, wondered if post image
        create is also possible?

    linked-property -> linked-property
    linked-set-property -> set-linked-property
        Are these subcommands managing properties of the parent image
        that record information about a child (linked) image?  Or are
        they reading and manipulating the general image properties of
        the child (linked) image?

    image-create
        What characters are allowed in the linked image name?

        It seems like being able to name linked images can lead to
        naming collisions.  I'd like to suggest that this is strictly
        a human-readable "alias" for the image. And that you also change
        images to have a unique "id" (a UUID specifically), so that in
        the event that there is a naming collision,  you can still
        perform operations on an image using it's unique ID.

        Alternatively, you could leave it as "name", but I'd still like
        to see images have a unique ID that we could rely on instead.
        Again, I wonder if the name here is a property of the linked
        image being created, or something that the master is recording.

        If the name is a property recorded in the linked image, then
        I'd also suggest that the parent may want to reference linked
        images only by their unique id & path to allow the name to be
        changed at any time.

lines 766-771:
    In the past, when I've suggested global options like this, it's been
    suggested that they be moved to the subcommands they actually apply
    to instead.  For example, I doubt this option really applies to the
    refresh, info, or list subcommands of pkg(1).  I'd also like to
    suggest that the option name be --ignore-linked or --skip-linked
    and can be specified multiple times.


Cheers,
-Shawn

[1] http://markmail.org/message/qep43eehttsyc5w7
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] initial linked images design doc

Reply via email to