Re: [pkg-discuss] initial linked images design doc

Edward Pilatowicz Tue, 01 Jun 2010 17:33:37 -0700

On Thu, May 27, 2010 at 07:19:23PM -0700, Shawn Walker wrote:
> On 05/27/10 07:01 PM, Edward Pilatowicz wrote:
> >On Thu, May 27, 2010 at 11:17:29AM -0700, Shawn Walker wrote:
> >>On 05/26/10 09:25 PM, Edward Pilatowicz wrote:
> >>>On Mon, May 24, 2010 at 01:39:43PM -0700, Shawn Walker wrote:
> >>I'd like to work with you to figure out some way we can avoid that,
> >>or alter the design where being able to link to a parent image isn't
> >>possible without some sort of explicit enabling of linking
> >>functionality.
> >>
> >
> >sure.  although i still don't understand why you would want this.  for
> >push based children you need to have write access to the parent image.
> >that's essentially "explicitly enabling".  for pull based children i
>
> The question is, why does it *need* write-based access to the parent image?
>


the only operations that need write access to the parent are those that
are going to update the parent image itself.  so say you wanted to
create a new export/push based child image.  you'd need write access to
the parent image so that you could update the metadata in the parent to
let it know about that child.

> >don't really see why you'd want to do this.  if i wanted to have a
> >partial user image that was synced up to whatever is installed on
> >jurassic, then since i have read access to jurassic i should be able to
> >do this.  i don't see why i'd have to ask the jurassic admins for
> >permissions first...
>
> Err...I'm confused now.  Above you said write access to the parent
> image is needed.  In what cases would a child need write access to a
> parent to perform a sync?
>

the example above is of a pull/import based child.  the parent doesn't
know anything about these type of children and for these types of images
you wouldn't need write access to the parent.

> >>>so really the zone needs to know about a subset of the information in
> >>>catalog.attrs and catalog.base.C.  now admittingly, i could represent
> >>>that data in the same format as the current catalog.attrs and
> >>>catalog.base.C files, but i haven't seen any advantage to doing that.
> >>>as i mentioned before, currently i just have an xml file that lists the
> >>>information, but i'm not wedded to that, it's more an implementation
> >>>detail.
> >>
> >>The advantage of re-using the catalog format is avoiding introducing
> >>an additional project private format and the fact that a lot of
> >>optimisation work has been done to efficiently store catalog data in
> >>that format.
> >>
> >
> >sure.  and if performance turns out to be an issue we can work on this
> >intermediate data exchange format as well.  hell, it could be the same
> >format as the current catalog.  (doesn't really matter to me.)
> >
> >to put some real numbers to this, currently for my testing, i'm using
> >a snv_136 parent image with redistributable installed.  for linked
> >images with a minimum sync policy the current package sync list (which
> >is a text base xml file) is 34K.  if i decide to sync everything, then
> >that goes up to 138K.
>
> The JSON equivalent should come in smaller than that, but I'd need
> to see what data is recorded first.
>

here's an example of my current data:
---8<---
<?xml version="1.0" encoding="ascii"?>
<master_installed>
  <item 
value="pkg://opensolaris.org/storage/[email protected],5.11-0.136:20100326T220926Z"/>
  <item 
value="pkg://opensolaris.org/system/xopen/[email protected],5.11-0.136:20100326T223854Z"/>
  ...
</master_installed>


> >>>>lines 421-427:
> >>>>     Since a linked image can realistically only account for the last
> >>>>     sync'd state of a parent image, it seems like the parent image
> >>>>     could simply provide constraints as a dynamically generated
> >>>>     incorporation package (manifest) as part of the export/import
> >>>>     process.  That manifest could then be stored locally and treated
> >>>>     exactly like a package normally would be without any special logic.
> >>>>     Doing so also makes it possible for the normal memory management
> >>>>     that the client api uses internally to not have to have special
> >>>>     logic to marshal this information to disk (possibly repeatedly).
> >>>>
> >>>
> >>>possibly, but isn't manifest dependency information cached in the
> >>>catalog?  if so this would require re-writing the catalog in zones which
> >>>requires reading the catalog in zones.  as i've pointed out before, that
> >>>would be bad.
> >>
> >>The catalog has to be read in zones; again, I'm not following the
> >>security/bad logic.  A catalog is used to track the state of the
> >>image, if we don't trust catalogs, then the client is useless.
> >>
> >>The only way to know the state of a child image is to read its catalog.
> >>
> >
> >correct.  and operations which are initiated on a parent image should
> >never do this.  only pkg processes running at reduced privileges or in
> >special environments (zones or scratch zones) should access child
> >images.
>
> Don't see a problem then; I wasn't suggesting that the parent read
> any of the child's image data.  I think we're in agreement here.
>
> Although, I'd point out that technically, it's not safe for a child
> image to read the parent's for the same reasons you point out above.
> Quite an impasse...
>

well, in the case of zones we always trust the global zone.  if the
global zone has been compromised then it's game over.

> >>>in case it isn't obvious, i'm really trying hard keep data flowing in
> >>>one direction.  from the parent image to the child.  i'm also trying to
> >>>keep the data flow simple.  having pkg traverse directories in a zone
> >>>image and read data from that zone isn't safe.  i tried to call this out
> >>>early in the document with in the "zones requirements" section:
> >>>
> >>>- since zones are untrusted execution environment, global zone pkg(5)
> >>>   operations should not be required to read data from non-global zone.
> >>>   ie, any data flow required to support linked images must be from a
> >>>   global zone to a non-global zone, and never in the reverse direction.
> >>
> >>I don't believe I've suggested anything that would require a ngz to
> >>read data from a gz.  I've only suggested *how* to export the data
> >>and the process to use to import it.
> >>
> >
> >perhaps i'm not understanding your suggestions.  i have no problems
> >changing the data format used to export/push data to clients.  i'm don't
> >want to use the existing on-disk parent catalogs because in many cases
> >that exposes to much information. i'm perfectly fine with using the
> >catalog format to expose a subset of what's installed.  if you've
> >suggested something else then i'm sorry but i've failed to understand
> >it.
>
> Ah, I see the confusion now.  Sorry.  The catalog format is fairly
> fungible in the information that you can store.  But the base
> information doesn't really expose anything at all--it's just a list
> of packages and an *optional* subset of each package's manifest
> data.
>
> I wasn't strictly suggesting that you use the /var/pkg/state
> directory as is.
>

well, if i can remove all the optional bits then it's something to
consider.  i'd have no objections of switching to that.  (and i'm
guessing it would give me versioning for free as well.)

> >>>>lines 429-436:
> >>>>     This could be greatly simplified by simply stating that the special
> >>>>     system packages will not have a publisher.  That is simpler and
> >>>>     works better than having a special string constant value for the
> >>>>     publisher.  That also fits nicely with the transport framework
> >>>>     since a publisher is required to perform transport operations, so
> >>>>     simply checking for "if pfmri.publisher" is faster and we avoid
> >>>>     the memory usage for the publisher string and parsing.  Every FMRI
> >>>>     normally has a publisher, so you can pretty much be guaranteed that
> >>>>     in any case where you'd care, you can rely on simply checking to see
> >>>>     if an FMRI has a publisher.
> >>>>
> >>>
> >>>sure.  i was never wedded to the name "none", and if this can be made to
> >>>work no problem.  it's just that there's a lot of code that assumes a
> >>>publisher exists, so going this route might end up requiring more code
> >>>changes that defining a special publisher.  (bart had suggested using an
> >>>invalid character in the publisher name to signify that it was
> >>>"special.")
> >>
> >>I'd rather fix the code that assumes a publisher exists than
> >>propagate that bad assumption.  There's lots of code that works
> >>without a publisher too.
> >>
> >>The logic for just not having one at all is simple--it completely
> >>avoids any potential collisions with an actual publisher and it
> >>ensures that the behaviour that we want for these is generic and
> >>consistent and doesn't rely on magic values.
> >>
> >
> >so.  is it possible to have any other packages installed on a system
> >which have no publisher?  for example, if i install a package and then
>
> and then?  I assume you mean and then "remove the publisher".
> Doesn't matter; installed packages *always* have a publisher.
> Likewise packages you can install *always* have a publisher.
> Remember that the publisher is just an identity string of who
> published the package.  So what you do with unset-publisher has no
> effect.
>

sorry for the incomplete thought there.  i tested this out myself to
verify the behavior you were describing.

so i was planning on switching from "none" to some other special name,
but i guess going to no publisher name would be ok as well.  so if i do
that i'd be switching from pkg uris that look like:
        pkg://none/...
to
        pkg:///...

i can add that to my todo list.

> >>My understanding of what you had written was that you had one
> >>package that represented the last known good set of constraints on
> >>the image, another package with the new set of constraints on the
> >>image, and then one to account in progress updates.
> >>
> >>So, my belief was that the first package represented what was
> >>already "installed" in the image, that is, its initial state.  I
> >>don't see how that could possible be invalid to start with since it
> >>represents the state of the child image after the last operation was
> >>performed.
> >>
> >>My belief then was that the other two packages represented the state
> >>you were upgrading to, which the solver should be able to handle,
> >>and if they're invalid, that's because the parent image is in an
> >>invalid state, at which point, none of this matters.
> >>
> >
> >no.  let me try to re-state this.  we have three packages:
> >
> >- constrai...@0,0-0 - this package is always marked as installed.  it
> >   represents the current constraints on an image iff that image is in
> >   sync with it's current constraints.  if an image is NOT is sync with
> >   it's current constraints, this package is empty.
> >
> >- constrai...@0,0-1 - this package never marked as installed. if an
> >   image is in sync with it's current constraints, this package is empty.
> >   if an image is OUT of sync with it's current constraints, this package
> >   represents the current constraints and installing this package will
> >   bring the image into sync with it's current constraints.
> >
> >- constrai...@0,0-2 - this package never marked as installed.  the
> >   contents of this package represents the planned constraints on an
> >   image and installing this package will bring the image into sync with
> >   it's planned constraints.
> >
> >now.  here's an example.  say we have an out-of-sync image wrt it's
> >current constraints.  the constrai...@0,0-0 package will be installed
> >but it will be empty.  so to sync this image we install the
> >constrai...@0,0-1 package.  this updates the image (if possible) to be
> >in sync with it's current constraints.  of course, once this operation
> >is done if we list the packages contents in the image we'll see that
> >constrai...@0,0-0 is still installed.  but once the image is in sync,
> >the contents of constrai...@0,0-0 will represent the constraints on the
> >image.
>
>
> Your explanation makes sense, but I still don't understand why this
> can't be represented as an upgrade from one state to another.  In
> other words, why have all of these different packages and all of
> this special empty/not empty logic.
>
> It would seem more logical to me to simply have the installed
> version of the constraints packages represent whatever the last set
> of constraints were that were installed.  Then, any constraints you
> want to apply are a newer version of that same package (timestamp
> only?).
>

that's pretty much what i'm doing.  :)  i'm just bumping the pkg version
instead of the timestamp.

> There's just too much magic here, so I assume I'm still missing
> something.  I'd like to chat with you next week about this in person
> so that I can get a better understanding.
>

sure we can chat about this.

> >>>where the constraints on linked-image-name are type specific.  if there
> >>>are no name collisions then a user can just specify<linked-image-name>.
> >>>if there are collisions a full name needs to be specified.  so some
> >>>examples would be:
> >>>   zones:<linked-image-name>
> >>>   default:<linked-image-name>
> >>>   user:<username>,<linked-image-name>
> >>>
> >>>for zones, the linked-image-name would have to conform to the
> >>>restrictions on zone names.
> >>>
> >>>for user images, the username would have to conform with what's
> >>>specified in passwd.4 as a valid username format.
> >>
> >>I don't think username should be in the image name.  User images
> >>aren't literally tied to a specific "user".  usernames can also be
> >>considered security-sensitive information, so I'd rather not have
> >>them there.
> >>
> >
> >well, user linked images are out of scope for my current proposal, so i
> >don't feel compelled to nail down a naming strategy.  i was just more
> >listing one above as an example.
>
> Ah, okay.  Probably best to just omit user images from the document
> entirely other than to say this proposal doesn't address them ;)
>
> The more I understand about this project, the more useful it seems
> to user images, but at the same time, the more it becomes clear that
> user images need their own separate design cycle.
>

well, i thought it was important to mention user linked images, since
thinking about them has influenced my design.  (for example, user linked
images are handy for pointing out that we need nesting, to allow for
user linked images within zone linked images, etc.)

so i'd like to still mention them, but perhaps i should explicitly add
them to the "out of scope" section?  current, when i talk about user
linked images i say "Support for user linked images is NOT included in
this proposal.", but i don't mention them in the "out of scope" section
(which is a bit weird).

ed
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] initial linked images design doc

Reply via email to