Re: [pkg-discuss] [REVIEW] Phase 2 of Catalog v1 Format Changes

Shawn Walker Tue, 08 Sep 2009 19:36:22 -0700

Danek Duvall wrote:

Shawn Walker wrote:

http://cr.opensolaris.org/~swalker/pkg-cat-p2-2/


Looks pretty good; most of this is nits, and I'm pretty sure that some of
my questions were already raised.

Bugids should be sort in increasing numerical order, though if there's one
umbrella bug to capture the entire change, it can go up top.

Does this fix bug 7063?

client.py:

  - line 356: why not img.PKG_STATE_KNOWN?

So, I've obviously been a bit opaque on package states, so let meattempt to clarify:

PKG_STATE_KNOWN - I axed this, since, in my mind, 'known' means that thepackage system has knowledge *of* the package; and as such, all'packages' are *known* to the packaging system if they are installed oravailable in a package repository's catalog.

The only reason that this still says 'known' is because I wanted todefer changing the output of pkg list. My thought was to dump thecurrent 'state' column and add an 'A' column (for available) to the UFIXset of columns we currently have. I can do so in this changeset.Either way I need to open a bug for this.

PKG_STATE_AVAILABLE - This state replaces PKG_STATE_KNOWN. This stateis only applied to packages that are currently *available* in arepository catalog.

PKG_STATE_INSTALLED - This state simply indicates that the package isinstalled.

PKG_STATE_PREFERRED - This state is a temporary one that is only used incombination with PKG_STATE_INSTALLED. It indicates that the version ofthe package is installed is from a publisher that was *preferred* at thetime of installation. This status will become obsolete once we startranking publishers instead of having the 'preferred' sledgehammer thatwe do now.

api.py:
  - line 105(?): do you no longer need to call load_config() on the image?

Nope. That's one of those things I wanted to get away from. I wantcallers to know as little as possible about the Image object. It cantell the object to save image configuration, but it seemed silly torequire callers to know that they have to call load_config() to load it.In fact, I've just changed load_config() to be private since nooutside callers use it now ...

image.py:

  - line 134ff: should these just be integers?  Bitfields?  I'm not sure

I've changed them to integers. Given JSON's representation of pythondata types, that seems the best fit.

    what some of these mean -- "preferred" state of a package? What's the
    difference between the two "installed" states?  Should the preferred
    state be private (__)?


I've added comments explaining what each state means.

  - line 1013: self.__catalogs.pop(name, None)?


Cool.  I've been missing perl's "del returns deleted value" behaviour :)

  - line 1080: this either/or seems a little sketchy to me.  Wouldn't you
    care almost exclusively whether the package was installed with the
    preferred publisher at the time, and not really care about what the
    current preferred publisher is?

The sketchiness is intentional as it mimics exactly what we previouslydid. As an example, if a publisher was preferred previously, then itsprefix would not be included when calling get_pkg_stem(). In addition,when a package was installed, if the publisher it was installed from waspreferred at the time of installation, then the publisher's prefix waswritten out to that package's installed file with __PRE__ intact. Thishad the effect of causing get_pkg_stem() calls on FMRIs of packages ofthe *current* preferred publisher and *previously* preferred publishersbeing omitted.

This method's existence is mostly a backwards-compatibility hack formodules/client/api.py's plan_update_all.

  - line 1431: so we'll actually rewrite the installed files, upgrading to
    the latest format of those files?  Do we ever use it again?


Nice spot; that should have been removed completely.

  - line 1480: small suggestion: the if/else could just define generators
    that yielded fmristrs, and then the four shared lines could simply draw
    from whichever generator was defined.

Actually, only three lines aren't shared that I can see, but I'veconsolidated this (though a bit differently).

  - line 1606: I'm probably confused, but I thought the installed catalog
    was not a subset of the known catalog -- that is, packages which are
    installed but not known by any publisher wouldn't show up in the known
    catalog.

See state definition far above. The 'installed' catalog is simply asubset of the 'known' catalog. That makes state management a lot easierIMO :)

manifest.py:

  - line 503: why this change?  For short lines, this could end up being
    slow.

Some of our manifest files are rather gigantic (10+ megabytes?). Ifigured this was a bit more memory friendly, but I'm willing to changethis back.

publisher.py:
  - line 1031: when does this get called?  Won't the catalogs have all been


At the end of refresh().

    converted when the image called __upgrade_image()?  Could that call the
    appropriate code here?  They seem reasonably similar, though obviously
    this one doesn't update the installed and known catalogs.

Remember that this is Phase II, so no repository provides v1 catalogs,only v0 catalogs. So the client has to retrieve the v0 catalog and thenimmediately convert it to v1 format. Even when Phase III is complete,the client will still have to support repositories that only provide v0catalogs.

  - line 1099: I guess I don't understand this, either.  Hm.  Well, I see
    that if it's just not there, then it doesn't do anything, which is
    fine.  But what happens if both the v0 and the v1 catalog are on the
    system?  I guess it's unlikely to happen, but if it's the case, then
    it'll take the v0 as canonical, which seems wrong to me.  If you took
    the v1 first, then you could even ignore a failure to remove the v0
    catalog.

The picture is incomplete here until Phase III is implemented. Seeabove. I've clarified the comment on line 1099 to better indicate theissues here.

  XXX line 1134: didn't the get_catalog() call get a v1 catalog?


No.  See above.  I've added a comment here to make this clearer.

catalog.py:
  - line 315: super() is generally considered dangerous; what's the reason
    you're using it here?  And shouldn't __init__() be called with "self"
    as the first arg?

That depends on who you ask. It's still around in py3k, and is in factbeing enhanced further. However, since I now have two detractors and Idon't yet have a need for this, I'll simply remove super(). And no,when using super() you don't pass self to the function; that's why selfis specified in the super() call [1].


The test suites wouldn't have passed otherwise :)

  - line 389: might this be clearer:

        return cmp(a[1], b[1]) or cmp(a[0], b[0])

    you could even do it as a lambda in the sorted() call, but that might
    not be worth it.  I'd probably create a generator expression that would
    yield pub/stem tuples, and an else clause (if not ordered) to do the
    same, and then iterate over those pairs outside the if for both cases.

Hm, now that I look, I don't actually use 'ordered' anywhere, so I'vedumped all the code for this. But I'll use your suggestion later when Ineed it. Thanks :)

  - line 421: didn't we determine at some point that a try/except setup was
    more expensive than testing for existence?  You could also just use
    .get(name, ()), and iterate over the empty list.


I had forgotten about that; changed.

  - it seems that fmris() and entries() could probably be defined in terms
    of each other; there's a lot of shared code here.

This was purely a performance guess on my part; that is, that fmrissimply calling entries but not yielding the entry data itself would beslower than simply doing what it is doing now. However, I'm willing tochange it.

  - line 574: using a generator might be a bit less memory-intensive than a
    list.

If you mean returning a generator, it was intentional that I did not.The point is to return the *unique* set of package names for allpublishers within the catalog.

  - line 683,684: what are rename and obsolete in this context?  Why do we
    need anything other than add and remove?

This was a mostly incomplete thought leftover from the old catalog thathad special api() calls for obsolete/rename. However, in retrospect,you're correct that we only really need add/remove.

  - line 1273: I don't see that behavior.

I did while running the test suite on b117 :( I'll check again, but Iknow I didn't imagine this.

  - line 1281: isn't this likely to get us into exactly the same trouble
    that got us into this except clause in the first place?  Or is it just
    that it won't be for EPERM, and everything else should get raised
    anyway?


My assumption (and the previous one) was the latter.

  - line 1444: no need for an atomic update here?


Do you mean copy the file somewhere else and then rename it into place?

It's definitely not the intent to rename the file into placestraight-away as that would remove it from the source directory (whichis definitely not intended).

  - line 1593: is there any reason not to always have BASE included?

If it's data the caller doesn't need, then I'm just wasting cyclesadding it to the metadata returned and wasting memory. At least, thatwas the thought...

  - line 1838: this is only used in some tests now; perhaps it can be
    retired?  Same with extract_matching_fmris() now, too.  I see you added
    a "reverse" argument, but I don't see it used anywhere.

In my initial prototype, I had re-written image.py:__inventory() to useget_matching_fmris() instead. However, it ended up being massivelyslower for update all because of the simple matching approach itcurrently uses (i.e. for each package installed on the system, attemptto match the package against *all* known packages...ugh).

I'd like to eventually re-write __inventory to use get_matching_fmris()as it seems useful to have that sort of functionality in the catalog.

I've fixed everything you don't see mentioned. I'll send out a webrevtomorrow after I've had some time to test and re-review.


Cheers,
--
Shawn Walker

[1] http://docs.python.org/library/functions.html#super
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] [REVIEW] Phase 2 of Catalog v1 Format Changes

Reply via email to