Re: [pkg-discuss] Code review for 9290 please (dependency generator)

Danek Duvall Tue, 07 Jul 2009 16:55:07 -0700

On Thu, Jul 02, 2009 at 06:36:25PM -0700, Brock Pytlik wrote:

> Danek Duvall wrote:
> >On Tue, Jun 23, 2009 at 06:01:11PM -0700, Brock Pytlik wrote:
> >
> >>http://cr.opensolaris.org/~bpytlik/ips-9290-v2/
> >
> > You have a lot of code that deals with variants, but you ignore facets
> > entirely.  Don't facets have some impact on dependency analysis, and
> > hence shouldn't many of the uses of "variant" in your code really be
> > "varcet"?
>
> They do, but I'm pretty sure they're going to need to be handled
> separately from variants. The impact of a variant being set or not is
> different from a facet being set. I thought about facets a while back,
> but after talking with Bart, I decided that they would introduce an
> unneeded level of complexity into this already complex wad. In short, I
> would guess that they will need to be carried around separately,
> depending on exactly how the implementation is done.


Okay.

> >There appears to be no man page.
> That's correct. I planned to deliver the man page with the second
> phase of put back. Given the extent of these changes, and the place
> that the second phase wad is at, I'm leaning heavily towards just
> combining them both, rather than doing two put backs.

That's fine; I just think it made this review harder, not having a
top-level explanation like a man page or other documentation would provide.

> > However, you might want to allow for a richer set of "reason"s than
> > just a single piece of information.  I don't see anything preventing
> > extension of that in the future, but keep it in mind.
>
> I'm happy to have either a richer set of reasons or additional tags to
> store whatever info is needed. The reasons are there currently for a)
> debugging info and b) to point the resolver in the right direction to
> handle the dependencies. As you said, I don't see anything that will
> restrict this in the future.

As long as they're considered Private interfaces between the analyzer and
the resolver, it doesn't really matter; those two pieces should always be
updated in concert, at least for now.

> >manifest.py:
> >
> >  - Given that so far this file doesn't have any client-specific imports,
> >    it might be nice not to do that.
> I can have get_all_variants return a dictionary, and make the client
> responsible for making a variant out of it for now. I'll also file a
> bug to move the variant module out of client and into a shared area.

Works for me.

> >  - This isn't your issue, but get_variants() is defined twice in the file.
> Ok, I'll clean that up
> [ ... ] Actually, I missed this at first but it's defined once in
> Manifest and once in CachedManifest, so I think both should stay.

Okay, I missed that, thanks.

> >variant.py:
> >
> >  - line 91: sets have an issubset() method
> Right... but Variants are dictionaries (more or less). Are you
> looking to have the function renamed, or were you saying I could use
> that instead?

Renamed; I just figured that in the aspects where variants act like sets,
you might as well use the same method names.  This will prevent confusion.
Of course, if the behavior is different, then it'll add confusion.

> >  - line 104: why?
> Why a RuntimeError or why can't you do a diff when the keys aren't a subset?

The latter.

> It's an error because for the purposes I'm using it for, self should
> always be a superset of the var.

Hm, okay.  Seems like an assertion to me, but a more appropriate exception
eventually is probably fine.

> >  - line 111: fmris have a tuple() method; perhaps this should follow?
> I'm happy to name this tuple instead.
> >  - line 112: Ah, but what exactly *is* the "tuple form of self"?
> Ok, I'll expand the docstring.
> >  - line 113: I'm guessing "return tuple(self.items())" isn't quite what
> >    you want.  Perhaps
> >
> >        return sum(self.iteritems(), ())
> >
> >    that won't do the sorting for you, but you could enforce the sorting
> >    elsewhere ...
> The format is (<key>, (<value for that key>, <value for that key>,
> <value for that key>...), <key>, (<value..>,...) ...), but since I
> think to_tuple is no longer actually being used by the code, I'm
> likely to just nuke this.

Eh, okay.  Though I think my sum() concoction would give you exactly the
format you describe.

> >base.py:
> >
> >  - line 109: why this and not a merge()?
> I don't understand how merge would be used here... If you mean
> instead of the iteration over pkg_vars to add more things into vars,
> it's because vars.merge(pkg_vars) will always equal pkg_vars.

Because vars.keys() is always a subset of pkg_vars.keys()?

> This is to handle the fact that if a package was published against x86
> and sparc, having no variant.arch tag and having variant.arch={x86,
> sparc} (if that was a valid action tag) are identical. To make the logic
> on manipulating variants simpler, we add in the pkg's variant tags only
> where the action didn't define a variant for that tag.

Okay.

> >  - line 123: I might do
> >
> >        r = cmp(...) or \
> >            cmp(...) or \
> >            cmp(...)
> I'd rather not do that, but I will if it really matters.

It doesn't.  Just seems clearer to me.

> >  - line 125: so dep_key() and action_path() return very very different
> >    things -- the former (according to the docstring) is the target of the
> >    dependency, while the latter is the source of the dependency.  Is it
> >    appropriate to sort dependencies by these two disparate things?
> Why wouldn't it be?

Just seems weird.  I think it felt wrong at the time, but if it's just to
have some sane sorting order, I guess that's fine.

> >  - line 153: "dep_path" is what, the path of the file that generated the
> >    dependency, or the path of the file that resolves the dependency?
> The path to the file that resolves the dependency. The path to the
> file that generated the dependency is stored in the action.

Okay.  Docstring.

> >  - line 161: "an index"?
> Would "a dictionary work better?

Yes, thanks.

> >  - line 170: You might want to XXX-note that this is only useful when the
> >    installed system matches delivered_files.  Eventually, you'll have to
> >    do the symlink resolution yourself.
> Yep, I was aware of this issue, I'll add the XXX note.
> [ ... ] Actually, I don't think I will need to do the symlink resolution,
> in this context. These are reference real files in the proto area(s) and
> on disk. It would seem to me that we can actually rely on realpath and
> normpath here. For doing the resolution against packages that actively
> being published, I agree and know that eventually we'll need to deal with
> that.

Okay; it just looked like you were operating on a thing that was in memory,
not on disk.  And there's nothing in the code here that would prevent it
from doing that, except for the call to realpath().  I'd put a comment here
that you're making this assumption, and that it's okay.

Note, though, that if you've got a (relative) link in a proto area, the
path you'll get back will be in the proto area, too, not something you'd
want to stick in a manifest.  Not sure if this matters to you in this
method.  Another assumption to document?

> >  - line 175: why not simply return a value on error, and None otherwise?
> Assuming you're talking about resolve, the reason is that other
> functions return something other than self.ERROR paired with a
> value.

What other functions are called in the same context, and where is that
context?

> >check_deps.py
> >
> >  - You've got some terminology confusion here.  The module is named
> >    "check_deps" but on lines 44 and 127, you define functions talking
> >    about "list"ing dependencies, and implicit ones at that.  I think I
> >    agree with Shawn here -- the module should be part of
> >    pkg.publish.dependencies (which should perhaps be pkg.publish.depend),
> >    and it should be about "analyzing" packages or manifests to determine
> >    what its dependencies are.  I don't understand what your objection is
> >    to putting things in __init__.py or to using the word "analyze" in this
> >    context.  I would put a function in the "pkg.publish.depend" module
> >    called "analyze", which would depend on two private functions called
> >    "__remove_internal_deps()" and "__analyze()" (as the latter does most
> >    of the real work of the front-end "analyze()")
> I'm fine with moving this to modules/publish/depend/analyze.py. It was
> called check_deps because, eventually, that's what it would up doing, but
> I'll bow to the crowd on this one. However, it was my intention to have
> the modules for each type of dependencies live in their own subdirectory
> like the actions and bundle types do. Would you like that directory to be
> modules/publish/depend/types/? And have analyze in the depend directory?
> How would modules/publish/analyze_dependencies.py and
> modules/publish/depend/* work?

There's another place we're going to need per-filetype code: content
hashes.  Thinking ahead to that, I'd probably pull this out of publish
entirely.  We've been talking about "tasting" files to determine what they
are, so perhaps pkg.flavor.{elf,script,smf,jar,etc}?

> >  - I also think "show_internal_deps" is poorly named.  Perhaps
> >    "remove_internal_deps" and switch the meaning of the flag?
> Sure, but I don't see any difference between those two.

I guess it's that "show" implies a UI component of some sort, while
"remove" (or "exclude", or "include") doesn't.  Part of code elegance is
having a consistent language and set of metaphors for the names of the
objects your code uses.  Mixing metaphors is as bad in code as it is in
spoken language.

> >  - line 55, 58: how are plat and isalist supposed to work?  A dependency
> >    on $PLATFORM is evaluated at load-time, so if a module under $PLATFORM
> >    is delivered in different packages depending on $PLATFORM (I can't
> >    actually find an example), and something in /kernel/drv depends on it,
> >    how do you resolve which package it's supposed to come from?  If you
> >    (or whoever is running your code) chooses one or the other, then you
> >    end up with a dependency that is correct on one platform and not on
> >    another (and indeed doesn't actually get that dependency resolved).
> >    You could depend on both, which is probably the sanest answer, since
> >    it's simple, and shouldn't be a big deal for minimization.  Or you
> >    could facet it.  Similarly for $ISALIST.
> I don't have a good solution here. I thought this was better than
> just punting on them, I'm happy to do that if it's preferred and
> file a bug to try to get it more perfect in the future.

Punt for now, I think.  The impact is limited.  We can revisit this again
once we've decided how to deal with platforms more generally.

> >  - line 166: I think this really needs to look more like a dispatcher.
> >    Your internal file tasting mechanism should return something that can
> >    be plugged into a table that maps file types to processing methods, and
> >    you should be catching base-class exceptions.
> I think you're using dispatcher with a more specific technical
> meaning than I had for it. I'll talk to you next week to make sure
> I'm on the same page. For the rest, I understand and will switch to
> that.

Okay.  I think I'm using dispatcher in a way that matches the rest of the
comment, but we can chat.

> >  - line 198: so ... what does "missing" mean in this context?  That the
> >    analyzer for the file type is missing?
> Yep.

Okay, please document.  Probably by documenting the return values of the
method in the docstring.

> >pound_bang.py:
> >
> >  - line 73: doing this right and dealing with the use of readlink() I
> >    mentioned before may very well be the same thing.  I'm not sure this is
> >    worth doing wrong.  It certainly doesn't hold on Linux.  At the very
> >    least, there should be a trailing slash to make sure it's not
> >    /binsomething.
> I've missed where you talked about the use of readlink() before. In
> general, I'm confused by this comment so I'll find you next week to
> get a better handle on what the problem is.

The previous reference was my comment for line 170 of base.py -- realpath
and readlink are closely related, and I must have forgotten what you'd used
there.  At any rate, my point is that resolving through a symlink probably
should be done "right" rather than this hacky way which isn't going to work
on all platforms (because /bin is a separate directory on Linux).  If Tom's
okay with this, though, then it's probably okay for now.

> >  - line 85: shouldn't this be "return [], []", or is the tuple return on
> >    line 84 incorrect?  process_elf_dependencies() returns a single list,
> >    but process_python_dependencies() returns a tuple of lists.
> It should be return [], [], but the other differences are
> intentional (in the since that they're as designed). If the code
> changes as you suggested for the dispatcher, I guess I'll need to
> commonize the return types of these functions.

Yeah.

> >depthlimitedmf.py:
> >
> >  - line 21: is there anything more to this method than adding the first
> >    line to the one from the parent class?  Similarly for load_module()?
> >
> For run_script, I'll call the super method. For load_module, lines
> 53-58 are new/changed, so that probably needs to stay.

Yeah, I just figured you could get the debug message at the wrong point and
it wouldn't hurt.  We've got dtrace for that, anyway.

> >pkgdep.py:
> >
> >  - My guess is that you'll usually want to emit the original manifest in
> >    addition to the dependencies, rather than usually omitting it.
> My guess was that when users were running it by hand, they'd mostly
> want it without the original manifest, while scripts would want the
> original manifest. That's why I chose the default I did, I can
> change it if I'm wrong.

Huh.  I would expect the full thing when running it by hand, but I've shown
time and again I'm weird.  I'm open to being shown weird again.

> >  - Given that you're not treating "missing" files as error messages, they
> >    should probably be emitted as actions, not bare strings.  But if I
> >    understand what's these actually are, I'd guess that a) they should be
> >    sent to stderr, and b) they should only be emitted in a verbose mode,
> >    because mostly you won't care.
> I put the flag because mostly you won't care (I think that's the
> verbose mode you wanted...).

Okay.  Duh.  

> I'm fine with sending them to stderr.

That'd be good.

> >  - I would expect that the resolver is run from the same command as the
> >    analyzer, which would suggest two subcommands for pkgdep: "analyze" and
> >    "resolve".
> Yeah, I ended up doing that in my resolver wad. I went with generate
> and resolve. I prefer generate over analyze, though if you care,
> I'll change it.

I do care -- "generate" sounds like you're creating stuff out of whole
cloth; analyze sounds like you're figuring out what's already there.  Maybe
others have a different feel for it.

Thanks,
Danek
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] Code review for 9290 please (dependency generator)

Reply via email to