Re: [pkg-discuss] Code review for 9290 please (dependency generator)

Brock Pytlik Thu, 02 Jul 2009 18:37:21 -0700

Danek Duvall wrote:

On Tue, Jun 23, 2009 at 06:01:11PM -0700, Brock Pytlik wrote:

http://cr.opensolaris.org/~bpytlik/ips-9290-v2/


You have a lot of code that deals with variants, but you ignore facets
entirely.  Don't facets have some impact on dependency analysis, and hence
shouldn't many of the uses of "variant" in your code really be "varcet"?  I
can understand that facets aren't fully implemented and testable, but I'd
hate to see them be introduced and have a bunch of variable and method
names in this wad either need to change or be misleading.

They do, but I'm pretty sure they're going to need to be handledseparately from variants. The impact of a variant being set or not isdifferent from a facet being set. I thought about facets a while back,but after talking with Bart, I decided that they would introduce anunneeded level of complexity into this already complex wad. In short, Iwould guess that they will need to be carried around separately,depending on exactly how the implementation is done.

There appears to be no man page.

That's correct. I planned to deliver the man page with the second phaseof put back. Given the extent of these changes, and the place that thesecond phase wad is at, I'm leaning heavily towards just combining themboth, rather than doing two put backs.

Just for comparison, the analyzer I'd had spat out actions like this:

    file d4c3eb7039ba48253b69a86ed2663124e3f50832 group=bin mode=0755 \
        owner=root path=usr/lib/amd64/libbrand.so.1 
pkg.tmp.elf.dependencies=libc.so.1 \
        pkg.tmp.elf.dependencies=libxml2.so.2 
pkg.tmp.elf.dependencies.libc.so.1=SUNW_1.22 \
        pkg.tmp.elf.dependencies.libc.so.1=SUNW_1.19 
pkg.tmp.elf.dependencies.libc.so.1=SUNW_1.1 \
        pkg.tmp.elf.dependencies.libc.so.1=SUNW_0.9 
pkg.tmp.elf.dependencies.libc.so.1=SUNW_0.8 \
        pkg.tmp.elf.dependencies.libc.so.1=SUNW_0.7 
pkg.tmp.elf.dependencies.libxml2.so.2=SUNW_1.3 \
        pkg.tmp.elf.dependencies.libxml2.so.2=SUNW_1.1 
pkg.tmp.elf.info.arch=i386 \
        pkg.tmp.elf.info.bits=64 pkg.tmp.elf.info.end=lsb 
pkg.tmp.elf.info.osabi=none \
        pkg.tmp.elf.info.type=so pkg.tmp.elf.versions=SUNWprivate 
pkg.tmp.filetype=elf

and the resolver ended up tacking on

    pkg.tmp.introduced_deps=pkg:/[email protected],5.11-0.101 \
    pkg.tmp.introduced_deps=pkg:/[email protected],5.11-0.101

in addition to the actual depend actions.  Storing all this info in depend
actions is fine, and you certainly don't need all the ELF versioning bits I
had (someday, maybe, but not now) nor all the other ELF data; I just threw
everything in.  However, you might want to allow for a richer set of
"reason"s than just a single piece of information.  I don't see anything
preventing extension of that in the future, but keep it in mind.

I'm happy to have either a richer set of reasons or additional tags tostore whatever info is needed. The reasons are there currently for a)debugging info and b) to point the resolver in the right direction tohandle the dependencies. As you said, I don't see anything that willrestrict this in the future.


manifest.py:

  - Given that so far this file doesn't have any client-specific imports,
    it might be nice not to do that.

I can have get_all_variants return a dictionary, and make the clientresponsible for making a variant out of it for now. I'll also file a bugto move the variant module out of client and into a shared area.

  - This isn't your issue, but get_variants() is defined twice in the file.

Ok, I'll clean that up

  - get_variants() returns a list.  Perhaps get_all_variants() could return
    a dictionary?  Also, docstring please.

Ok

  - Would you consider

        return dict((
            (name, self.attributes[name])
            for name in self.attributes
            if name.startswith("variant.")
        ))

Yep, looks good to me.

variant.py:

  - line 54: not your bug, but "__ketset" should be "__keyset"

Ok, I'll clean that up

  - line 83: "two sets of variants" or "two variants" or perhaps "two
    Variants objects"?  Could you also make the docstrings correctly
    punctuated sentences, and drop the spaces at the beginning and ends of
    the strings?  I don't know where that style came from, but it's wrong
    wrong wrong.

Ok, I was following the style in the file, but I'll clean mine up.

  - line 86: self[name] = list(set(var[name] + self[name]))?  Or perhaps
    replace the whole if statement with

        self[name] = list(set(var[name] + self.get(name, [])))

    which leads to

        self.update(dict((
            (name, list(set(var[name] + self.get(name, []))))
            for name in var
        )))

I'll probably change + to .extend, but otherwise I'm happy to make thisswitch.

  - line 91: sets have an issubset() method

Right... but Variants are dictionaries (more or less). Are you lookingto have the function renamed, or were you saying I could use that instead?

  - line 100: sets have a difference() method

See above quesiton.

  - line 103: isn't set(self.keys()) equivalent to self.__keyset?  Can you
    access var.__keyset?

Yep, I think you're right.

  - line 104: why?

Why a RuntimeError or why can't you do a diff when the keys aren't a subset?

A RuntimeError because I haven't settled on a broader exceptionpassing/handling setup yet. I was planning that as part of phase two,but see my earlier comments about combining them.

It's an error because for the purposes I'm using it for, self shouldalways be a superset of the var.

  - line 111: fmris have a tuple() method; perhaps this should follow?

I'm happy to name this tuple instead.

  - line 112: Ah, but what exactly *is* the "tuple form of self"?

Ok, I'll expand the docstring.

  - line 113: I'm guessing "return tuple(self.items())" isn't quite what
    you want.  Perhaps

        return sum(self.iteritems(), ())

    that won't do the sorting for you, but you could enforce the sorting
    elsewhere ...

The format is (<key>, (<value for that key>, <value for that key>,<value for that key>...), <key>, (<value..>,...) ...), but since I thinkto_tuple is no longer actually being used by the code, I'm likely tojust nuke this.

os_sunos.py:

  - You're not going to need the power of the full magic database, and I
    don't think the output of the file command is Committed, so I think I'd
    rather see you do the magic without forking off another process.
    Should be faster, too.

I'll look into the amount of code this would take.

  - You're never removing t_path, are you?

Good point.

base.py:

  - line 36: what is "fp"?  File pointer?

File path, I'll change it.

  - lines 42, 56: shouldn't you have a base exception class and have these
    guys defined in elf.py and python.py, respectively, inheriting from
    that base class?

Yep

  - line 84: does "if vs" work?

Hmm, probably would.

  - line 109: why this and not a merge()?

I don't understand how merge would be used here... If you mean insteadof the iteration over pkg_vars to add more things into vars, it'sbecause vars.merge(pkg_vars) will always equal pkg_vars. This is tohandle the fact that if a package was published against x86 and sparc,having no variant.arch tag and having variant.arch={x86, sparc} (if thatwas a valid action tag) are identical. To make the logic on manipulatingvariants simpler, we add in the pkg's variant tags only where the actiondidn't define a variant for that tag.

  - line 123: I might do

        r = cmp(...) or \
            cmp(...) or \
            cmp(...)

I'd rather not do that, but I will if it really matters.

  - line 125: so dep_key() and action_path() return very very different
    things -- the former (according to the docstring) is the target of the
    dependency, while the latter is the source of the dependency.  Is it
    appropriate to sort dependencies by these two disparate things?

Why wouldn't it be?

  - line 145: as we discussed in the hallway yesterday, this should simply
    be a "depend" action, and all the attributes attached to it should have
    an appropriate prefix (and you might as well throw on type=require).

Thanks for thinking of adding require, I'll put that on.

  - line 153: "dep_path" is what, the path of the file that generated the
    dependency, or the path of the file that resolves the dependency?

The path to the file that resolves the dependency. The path to the filethat generated the dependency is stored in the action.

  - line 161: "an index"?

Would "a dictionary work better?

  - line 165: a KeyError waiting to happen?  Or should this simply be a
    named input variable?

Possibly? I'll take a look at other ways of making this work.

  - line 170: You might want to XXX-note that this is only useful when the
    installed system matches delivered_files.  Eventually, you'll have to
    do the symlink resolution yourself.

Yep, I was aware of this issue, I'll add the XXX note.

  - line 175: why not simply return a value on error, and None otherwise?

Assuming you're talking about resolve, the reason is that otherfunctions return something other than self.ERROR paired with a value.

  - line 186: "if p"?

I prefer the exact check against None in this case, since None is whatpossibly_delivered explicitly returns.

elf.py:

  - line 74: given that pkgsend is now growing multiple -d options, it
    might make sense to support multiple proto areas.  You might check with
    Liane to see if she'd want this (not that it can't be changed later,
    but ...).

Sure, but I'll do that as a latter putback. There's nothing inherent inthis code that can't be easily extended to multiple proto areas.

  - line 117: Shouldn't "p" be "tp" here?  Or use "p" through the rest of
    the loop?

It should be "tp" I think. I'll go grumble quietly in a corner about thefrustration of python variables escaping what I consider their contexts.

  - line 123-125, 150-151: presumably these are no longer relevant given
    that you're now doing $PLATFORM and $ISALIST processing.  But I'm not
    entirely sure how that processing is supposed to work, so the questions
    are still unanswered.

150-151 I will remove. 123-125 still apply as far as I know since theonly changes I made here were changing prints to raise exceptions.

  - Worth considering replacing the RuntimeErrors with something more
    sophisticated later on.

Yep, planning on it.

check_deps.py

  - You've got some terminology confusion here.  The module is named
    "check_deps" but on lines 44 and 127, you define functions talking
    about "list"ing dependencies, and implicit ones at that.  I think I
    agree with Shawn here -- the module should be part of
    pkg.publish.dependencies (which should perhaps be pkg.publish.depend),
    and it should be about "analyzing" packages or manifests to determine
    what its dependencies are.  I don't understand what your objection is
    to putting things in __init__.py or to using the word "analyze" in this
    context.  I would put a function in the "pkg.publish.depend" module
    called "analyze", which would depend on two private functions called
    "__remove_internal_deps()" and "__analyze()" (as the latter does most
    of the real work of the front-end "analyze()")

I'm fine with moving this to modules/publish/depend/analyze.py. It wascalled check_deps because, eventually, that's what it would up doing,but I'll bow to the crowd on this one. However, it was my intention tohave the modules for each type of dependencies live in their ownsubdirectory like the actions and bundle types do. Would you like thatdirectory to be modules/publish/depend/types/? And have analyze in thedepend directory? How would modules/publish/analyze_dependencies.py andmodules/publish/depend/* work?

  - I also think "show_internal_deps" is poorly named.  Perhaps
    "remove_internal_deps" and switch the meaning of the flag?

Sure, but I don't see any difference between those two.

  - line 55, 58: how are plat and isalist supposed to work?  A dependency
    on $PLATFORM is evaluated at load-time, so if a module under $PLATFORM
    is delivered in different packages depending on $PLATFORM (I can't
    actually find an example), and something in /kernel/drv depends on it,
    how do you resolve which package it's supposed to come from?  If you
    (or whoever is running your code) chooses one or the other, then you
    end up with a dependency that is correct on one platform and not on
    another (and indeed doesn't actually get that dependency resolved).
    You could depend on both, which is probably the sanest answer, since
    it's simple, and shouldn't be a big deal for minimization.  Or you
    could facet it.  Similarly for $ISALIST.

I don't have a good solution here. I thought this was better than justpunting on them, I'm happy to do that if it's preferred and file a bugto try to get it more perfect in the future.

  - line 78: m.get_all_variants() already returns a Variants object.

It won't now, so this one will need to remain.

  - line 85: We've been using "mfst" for the four-letter abbreviation for
    "manifest".

ok


  - line 166: I think this really needs to look more like a dispatcher.
    Your internal file tasting mechanism should return something that can
    be plugged into a table that maps file types to processing methods, and
    you should be catching base-class exceptions.

I think you're using dispatcher with a more specific technical meaningthan I had for it. I'll talk to you next week to make sure I'm on thesame page. For the rest, I understand and will switch to that.

  - line 198: so ... what does "missing" mean in this context?  That the
    analyzer for the file type is missing?

Yep.

pound_bang.py:

  - I don't care too much, but I think "shebang" is a better term for this.
    If that's too arcane, then perhaps "script".

Ok, script works for me.

  - line 69: drop the slash.  It can be separated from the shebang by any
    amount of whitespace, and needn't exist at all.

Ok

  - line 70: usedlist?  I know you were looking for solaris.py
    compatibility, but that's going too far.  :)

Heh :)

  - line 71: single-line comments don't wrap.  Either it fits on the line,
    or put it on the line before.

I'll clean that up.

  - line 73: doing this right and dealing with the use of readlink() I
    mentioned before may very well be the same thing.  I'm not sure this is
    worth doing wrong.  It certainly doesn't hold on Linux.  At the very
    least, there should be a trailing slash to make sure it's not
    /binsomething.

I've missed where you talked about the use of readlink() before. Ingeneral, I'm confused by this comment so I'll find you next week to geta better handle on what the problem is.

  - line 85: shouldn't this be "return [], []", or is the tuple return on
    line 84 incorrect?  process_elf_dependencies() returns a single list,
    but process_python_dependencies() returns a tuple of lists.

It should be return [], [], but the other differences are intentional(in the since that they're as designed). If the code changes as yousuggested for the dispatcher, I guess I'll need to commonize the returntypes of these functions.

depthlimitedmf.py:

  - line 11: not used?

It was while I was debugging, I'll pull it.

  - line 21: is there anything more to this method than adding the first
    line to the one from the parent class?  Similarly for load_module()?

For run_script, I'll call the super method. For load_module, lines 53-58are new/changed, so that probably needs to stay.

pkgdep.py:

  - My guess is that you'll usually want to emit the original manifest in
    addition to the dependencies, rather than usually omitting it.

My guess was that when users were running it by hand, they'd mostly wantit without the original manifest, while scripts would want the originalmanifest. That's why I chose the default I did, I can change it if I'mwrong.

  - Given that you're not treating "missing" files as error messages, they
    should probably be emitted as actions, not bare strings.  But if I
    understand what's these actually are, I'd guess that a) they should be
    sent to stderr, and b) they should only be emitted in a verbose mode,
    because mostly you won't care.

I put the flag because mostly you won't care (I think that's the verbosemode you wanted...). I'm fine with sending them to stderr.

  - I would expect that the resolver is run from the same command as the
    analyzer, which would suggest two subcommands for pkgdep: "analyze" and
    "resolve".

Yeah, I ended up doing that in my resolver wad. I went with generate andresolve. I prefer generate over analyze, though if you care, I'll change it.

Danek

New webrev will take a bit of time to generate, hopefully sometime nextweek it'll be ready.


Brock
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] Code review for 9290 please (dependency generator)

Reply via email to