Re: [pkg-discuss] [REVIEW] Phase 2 of Catalog v1 Format Changes

Danek Duvall Wed, 09 Sep 2009 09:18:11 -0700

Shawn Walker wrote:

> So, I've obviously been a bit opaque on package states, so let me
> attempt to clarify:
> 
> PKG_STATE_KNOWN - I axed this, since, in my mind, 'known' means that
> the package system has knowledge *of* the package; and as such, all
> 'packages' are *known* to the packaging system if they are installed
> or available in a package repository's catalog.
> 
> The only reason that this still says 'known' is because I wanted to
> defer changing the output of pkg list.  My thought was to dump the
> current 'state' column and add an 'A' column (for available) to the
> UFIX set of columns we currently have.  I can do so in this
> changeset. Either way I need to open a bug for this.
> 
> PKG_STATE_AVAILABLE - This state replaces PKG_STATE_KNOWN.  This
> state is only applied to packages that are currently *available* in a
> repository catalog.


I don't really care, since this is all deep internals, but I think this is
gratuitous.  I don't see how "known" is any different from "available".
Neither English word really implies that the package isn't installed, which
is, as far as I can tell, your complaint with the word "known".  So either
word is just fine as a technical term that means "known to the system, but
not installed".

As far as the output of "pkg list" is concerned (and I think you're not
changing that here, right?), please make sure Stephen is aware of any
change you're making, as he's the one who designed the original, including
the separation between the STATE and UFIX columns.

> >api.py:
> >  - line 105(?): do you no longer need to call load_config() on the image?
> 
> Nope.  That's one of those things I wanted to get away from.  I want
> callers to know as little as possible about the Image object.

This is the API, though -- if it does the work, then no callers have to.
The change is fine, though.

> >  - line 1080: this either/or seems a little sketchy to me.  Wouldn't you
> >    care almost exclusively whether the package was installed with the
> >    preferred publisher at the time, and not really care about what the
> >    current preferred publisher is?
> 
> The sketchiness is intentional as it mimics exactly what we previously
> did.  As an example, if a publisher was preferred previously, then its
> prefix would not be included when calling get_pkg_stem().  In addition,
> when a package was installed, if the publisher it was installed from was
> preferred at the time of installation, then the publisher's prefix was
> written out to that package's installed file with __PRE__ intact.  This
> had the effect of causing get_pkg_stem() calls on FMRIs of packages of
> the *current* preferred publisher and *previously* preferred publishers
> being omitted.
> 
> This method's existence is mostly a backwards-compatibility hack for
> modules/client/api.py's plan_update_all.

Um, then it could simply be gotten rid of, no?  There's no sense in
carrying around backwards-compatibility hacks for internals.

> >manifest.py:
> >
> >  - line 503: why this change?  For short lines, this could end up being
> >    slow.
> 
> Some of our manifest files are rather gigantic (10+ megabytes?).  I
> figured this was a bit more memory friendly, but I'm willing to
> change this back.

<shrug>  Ideally, you'd batch the lines up in groups of a particular size,
but failing that, I doubt it matters much which of the easier routes you
take.

> >  - it seems that fmris() and entries() could probably be defined in terms
> >    of each other; there's a lot of shared code here.
> 
> This was purely a performance guess on my part; that is, that fmris
> simply calling entries but not yielding the entry data itself would
> be slower than simply doing what it is doing now.  However, I'm
> willing to change it.

Yeah, I dunno.  It always pains me to see multiple blocks of code which are
almost identical, simply because of the maintenance burden.  If there's a
performance gain to be had, there probably should be a comment explaining
it.

> >  - line 574: using a generator might be a bit less memory-intensive than a
> >    list.
> 
> If you mean returning a generator, it was intentional that I did not.
> The point is to return the *unique* set of package names for all
> publishers within the catalog.

Sorry, I was unclear.  I meant using a generator expression instead of a
list as the basis for the set:

    return set((
        stem
        for pub in self.__data
        for stem in self.__data[pub]
    ))

So for n-thousand package stems you won't have the entire copy in memory
both as a list and as a set; you'll just have the set, and the generator
expression will only spit one out at a time.

> >  - line 1273: I don't see that behavior.
> 
> I did while running the test suite on b117 :(  I'll check again, but
> I know I didn't imagine this.

I'm sure you didn't; perhaps your description of the condition is
incorrect?

    $ python -c "import os; os.lstat('/etc/kotd')"
    Traceback (most recent call last):
      File "<string>", line 1, in ?
    OSError: [Errno 2] No such file or directory: '/etc/kotd'

and that's not ENOTDIR.

> >  - line 1444: no need for an atomic update here?
> 
> Do you mean copy the file somewhere else and then rename it into place?
> 
> It's definitely not the intent to rename the file into place
> straight-away as that would remove it from the source directory
> (which is definitely not intended).

Hm.  Still, should the file appear at its destination atomically?  I would
think that would be advantageous, even if it doesn't solve all atomicity
problems.  If so, copy to a temporary location and move that into place.

> >  - line 1593: is there any reason not to always have BASE included?
> 
> If it's data the caller doesn't need, then I'm just wasting cycles
> adding it to the metadata returned and wasting memory.  At least,
> that was the thought...

Of course.  It just looked from the description like there's an
insignificant amount of data in that part that isn't duplicated in all the
other catalogs.  It's a list of packages (which is the duplicated part) and
the signatures (which is the tiny part).  Or have I misunderstood?

> >  - line 1838: this is only used in some tests now; perhaps it can be
> >    retired?  Same with extract_matching_fmris() now, too.  I see you
> >    added a "reverse" argument, but I don't see it used anywhere.
> 
> I'd like to eventually re-write __inventory to use get_matching_fmris()
> as it seems useful to have that sort of functionality in the catalog.

Sure.  And it looks like you've rewritten extract_matching_fmris()
substantially, too.  But if it doesn't do what you want, and it's not being
used, why keep it?

Danek
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] [REVIEW] Phase 2 of Catalog v1 Format Changes

Reply via email to