Re: [pkg-discuss] [REVIEW] Phase 2 of Catalog v1 Format Changes

Shawn Walker Wed, 09 Sep 2009 09:46:12 -0700

Danek Duvall wrote:

Shawn Walker wrote:

So, I've obviously been a bit opaque on package states, so let me
attempt to clarify:

PKG_STATE_KNOWN - I axed this, since, in my mind, 'known' means that
the package system has knowledge *of* the package; and as such, all
'packages' are *known* to the packaging system if they are installed
or available in a package repository's catalog.

The only reason that this still says 'known' is because I wanted to
defer changing the output of pkg list.  My thought was to dump the
current 'state' column and add an 'A' column (for available) to the
UFIX set of columns we currently have.  I can do so in this
changeset. Either way I need to open a bug for this.

PKG_STATE_AVAILABLE - This state replaces PKG_STATE_KNOWN.  This
state is only applied to packages that are currently *available* in a
repository catalog.


I don't really care, since this is all deep internals, but I think this is
gratuitous.  I don't see how "known" is any different from "available".
Neither English word really implies that the package isn't installed, which
is, as far as I can tell, your complaint with the word "known".  So either
word is just fine as a technical term that means "known to the system, but
not installed".

I'm open to suggestions here, or I can revert to KNOWN if you thinkthat's acceptable. I was half tempted to use the state names indoc/pkg-states.txt :) IN_CATALOG and OUT_CATALOG perhaps?

As far as the output of "pkg list" is concerned (and I think you're not
changing that here, right?), please make sure Stephen is aware of any
change you're making, as he's the one who designed the original, including
the separation between the STATE and UFIX columns.

We could honestly use the extra space for FMRIs (given our upcomingrename), and since packages will have multiple states in the nearfuture, I think we should do that.

api.py:
 - line 105(?): do you no longer need to call load_config() on the image?

Nope.  That's one of those things I wanted to get away from.  I want
callers to know as little as possible about the Image object.


This is the API, though -- if it does the work, then no callers have to.
The change is fine, though.

I just wanted to hide that even further so that the client api doesn'thave to worry about it either. I sort of took the approach that callersthat use the image object (regardless of who they are) shouldn't have tocall load_config; the image should just handle that as needed.

The sketchiness is intentional as it mimics exactly what we previously
did.  As an example, if a publisher was preferred previously, then its
prefix would not be included when calling get_pkg_stem().  In addition,
when a package was installed, if the publisher it was installed from was
preferred at the time of installation, then the publisher's prefix was
written out to that package's installed file with __PRE__ intact.  This
had the effect of causing get_pkg_stem() calls on FMRIs of packages of
the *current* preferred publisher and *previously* preferred publishers
being omitted.

This method's existence is mostly a backwards-compatibility hack for
modules/client/api.py's plan_update_all.


Um, then it could simply be gotten rid of, no?  There's no sense in
carrying around backwards-compatibility hacks for internals.

Until we get rid of preferred publisher I need it to retain our currentbehaviour; I'll add a comment indicating that.

 - it seems that fmris() and entries() could probably be defined in terms
   of each other; there's a lot of shared code here.

This was purely a performance guess on my part; that is, that fmris
simply calling entries but not yielding the entry data itself would
be slower than simply doing what it is doing now.  However, I'm
willing to change it.


Yeah, I dunno.  It always pains me to see multiple blocks of code which are
almost identical, simply because of the maintenance burden.  If there's a
performance gain to be had, there probably should be a comment explaining
it.

Right now, there's the additional nit that fmris() provides the objects=parameter, but I'll look at this again. It's already substantiallysmaller since I got rid of all of the ordered logic...

 - line 574: using a generator might be a bit less memory-intensive than a
   list.

If you mean returning a generator, it was intentional that I did not.
The point is to return the *unique* set of package names for all
publishers within the catalog.


Sorry, I was unclear.  I meant using a generator expression instead of a
list as the basis for the set:

    return set((
        stem
        for pub in self.__data
        for stem in self.__data[pub]
    ))

So for n-thousand package stems you won't have the entire copy in memory
both as a list and as a set; you'll just have the set, and the generator
expression will only spit one out at a time.


Ah; that's subtle, but makes sense.  Thanks for that.

 - line 1273: I don't see that behavior.

I did while running the test suite on b117 :(  I'll check again, but
I know I didn't imagine this.


I'm sure you didn't; perhaps your description of the condition is
incorrect?

    $ python -c "import os; os.lstat('/etc/kotd')"
    Traceback (most recent call last):
      File "<string>", line 1, in ?
    OSError: [Errno 2] No such file or directory: '/etc/kotd'

and that's not ENOTDIR.

It's likely the comment is wrong, but I know that ENOTDIR (errno 20) wasbeing raised by that block, either because of the chmod or the stat. Ihad thought it was the stat when I checked. I'll check again ...

 - line 1444: no need for an atomic update here?

Do you mean copy the file somewhere else and then rename it into place?

It's definitely not the intent to rename the file into place
straight-away as that would remove it from the source directory
(which is definitely not intended).


Hm.  Still, should the file appear at its destination atomically?  I would
think that would be advantageous, even if it doesn't solve all atomicity
problems.  If so, copy to a temporary location and move that into place.


I'll do the latter then.

 - line 1593: is there any reason not to always have BASE included?

If it's data the caller doesn't need, then I'm just wasting cycles
adding it to the metadata returned and wasting memory.  At least,
that was the thought...


Of course.  It just looked from the description like there's an
insignificant amount of data in that part that isn't duplicated in all the
other catalogs.  It's a list of packages (which is the duplicated part) and
the signatures (which is the tiny part).  Or have I misunderstood?

Actually, the dependency and summary catalog won't contain entries forpackages that don't have dependency or summary data. Realistically,every package will have summary data, but not every package will havedependency data. That's why it iterates over the BASE catalog part andthen merges the extra data as requested.

You're right that BASE's extra data is currently insignificant, andevery place I currently use it wants it, so perhaps the best option isto always include it for now. If it becomes an issue later I can alwaysrevert to what I'm currently doing.

 - line 1838: this is only used in some tests now; perhaps it can be
   retired?  Same with extract_matching_fmris() now, too.  I see you
   added a "reverse" argument, but I don't see it used anywhere.

I'd like to eventually re-write __inventory to use get_matching_fmris()
as it seems useful to have that sort of functionality in the catalog.


Sure.  And it looks like you've rewritten extract_matching_fmris()
substantially, too.  But if it doesn't do what you want, and it's not being
used, why keep it?

Hmm; currently it's a trivial wrapper (get_matching_fmris) and onlyextract_matching_fmris is used anywhere (tests) so I went ahead andreplace get_matching_fmris calls with extract_matching_fmris everywhereand dumped get_matching_fmris.


Cheers,
--
Shawn Walker
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] [REVIEW] Phase 2 of Catalog v1 Format Changes

Reply via email to