Re: [pkg-discuss] Code review: pkgrepo verify and fix

Tim Foster Thu, 13 Dec 2012 23:02:29 -0800

Hi Shawn,

Thanks for the thorough review, much appreciated. I've spent some timereworking the internals, and have a new webrev and incremental at:


https://cr.opensolaris.org/action/browse/pkg/timf/pkgrepo-verify-1/

There's minimal changes to t_pkgrepo over what I posted last time. Morecomments below.


On 12/12/12 12:46 PM, Shawn Walker wrote:

On 12/06/12 21:29, Tim Foster wrote:

Hi there,

I've got a set of changes here for code review, if anyone has time:

https://cr.opensolaris.org/action/browse/pkg/timf/pkgrepo-verify/pkgrepo-verify-webrev/


----------------------------------------
src/man/pkgrepo.1:
    line 781:  suggest:
      Verifies the following attributes of the package repository
contents are correct:

    line 815: s/over/using/

    line 881: This seems a little too vague.



That all sounds good. I've used
'Include output detailing the errors found during repository verification.'

----------------------------------------
src/modules/client/transport/repo.py:
    line 207, 212: I know it seems silly, but I'd prefer we prefix the
function names with something like 'publish_' to be consistent and avoid
potential (although unlikely) namespace conflicts.


It is silly, there's no publication going on here. Off-list, we

decided we agreed that using the transport at all wasn't worth thetrouble, so I've removed that.

    line 1039, 1046: s/IO/I\/O/


Yep.

----------------------------------------
src/modules/server/repository.py:
    line 268: This implies that versions other than 4 are supported,
      but the rest of the code clearly limits support to version 4.
      I know that in the future this may change, but I don't think
      the message should account for that now.


Right, but I'm not using this exception; this is fixing an error
with the existing RepositoryVersionError, which previously said:

+++
return("The repository at '%(location)s' is version "
 251 "'%(version)s'; only versions up to are supported.") %
+++

I'm using RepositoryInvalidVersionError, which requires a
specific version.

    line 270, 304: two newlines between classes


Thanks.

    line 1877: why not RepositoryUnsupportedOperationError() ?


Did you mean 1887? I think I was originally planning on reporting
a different error message if we had a repository that didn't have
any publishers available, but that didn't happen in the end.
RepositoryUnsupportedOperationError is fine.

    lines 1853-2403: I don't think that verify() should ever use a
progress tracker to provide information about the issues found.
Instead, it should either yield each chunk of information just as
Image.verify() does or it should accept a callback function that the
consumer can provide to accept the information and let the client handle
the output.  In particular, I don't like how the formatting of the
messages has been encoded into the verify() routine itself because of
the current approach.  Also, to greatly improve the readability /
maintainability of this function, I would not use nested function
definitions for each piece of logic.  I would instead make each one a
private function like 'def __verify_yield_error' or 'def
__verify_get_hashes'.  Read on below for additional suggestions.

Yep, that's reasonable (if a bit of a pain to re-jig) but definitelyworth doing. I've turned both verify() and fix() into generators, andall of the error message formatting is handled in pkgrepo.py now instead.

    line 1855: extra newline


Yep, fixed.

    lines 1895-1898: I'm not thrilled with dummy FMRIs for these cases.
      Is there a reason we have to output FMRIs and can't just use blank
      lines or something like "Unknown Package" "Repository" ?


Fair enough - the progress tracker I was using got upset if we tell it
we're iterating over FMRIs and don't actually hand it one - in these
cases because we've been unable to open a valid manifest to get the
FMRI name. I've changed the progress tracker to do the right thing
here, reporting "Unknown Package" instead.

The 'permissions_fmri' is slightly different, because it's a
potentially long-running process (hitting every file in the repository)
and I was looking to give some sense of what was being performed.
Instead, I've used 'None' for the fmri, and added a keyword argument to
indicate we should emit a message saying that we're doing a repository
scan.

In both of these cases, I'm relaxing the checks in these new
ProgressTracker methods to allow 'None' fmris.

    line 1966: did you intentionally avoid 'signature' actions?  I also
think we should avoid hardcoding 'file' and 'license' here and instead
look for 'has_payload' or the 'hash' attribute so that if we add
additional actions that can have payloads in the future, we'll get
those.  In fact, I think the release notes stuff Bart added may work
this way...


Yes - in this case, we're looking to print details
of the effects this would have by making this package unavailable
(in case it wasn't obvious to the user by the FMRI alone) by printing
the broken path in the package. Signature actions don't have a path, so
they're out of the picture here.  You could argue that the path within
the package is meaningless, given we know the package is broken anyway,
I just feel it provides more visibility into the extent of the damage,
rather than just printing a series of sha-1 hashes.

More generally, we don't have an iterator like
'gen_actions_by_payload' and I'd don't want to iterate over all actions
in the manifest looking for those with payload. However, you're right
in that we might add other actions that do deliver a payload, and
hardcoding the action names is bad. Instead, I've added a new member to
pkg.actions, 'payload_types', similar to 'types', so if we do add new
actions with payload that have a 'path' attribute, this will now work.
I expect 'payload_types' could be useful elsewhere too.

    lines 1974-2025: Per my earlier comment, I would yield these back to
the caller as a simple list of strings without the formatting.  If
desired, you could do it as a tuple of the form (error_summary,
error_details) where error_summary is something like 'Missing file:
4bca78af3bac' and error_details are the rest of the text.  I would
expect the consumer to then decide how to indent or otherwise display
the resulting text.  You also shouldn't pre-wrap the output.  Let the
caller use the Python textwrap module if they want to wrap the output.
I realise this is all more work, but eventually I'd like to make it
possible to trigger verify / fix from the web interface and the current
way this is being done makes that difficult at best.


Agreed. Hopefully this is cleaner now.

    lines 2249-2395: You might consider also placing this into a private
function so you can improve the readability a bit; just wrapper the call
to it with the try/except you have now.


Good idea, this is now all moved into self.__gen_verify(..)

    line 2403: seems like this should be before the call to rmtree;
      if the rmtree fails we won't unlock, but maybe that's the desired
      intent?


Good point - we should unlock before trying something more risky.
I care less about the shutil failing, since it's to a tmpdir, and even

if another repo verify job arrived at that point, we'd be using adifferent tempdir then, however, a locked repository would be bad.

Fixed.

    line 2411: s/referred/referred to/ ?


Yep.

    lines 2411-2414: Doesn't this mean that files can be orphaned?
      I admit it's rather expensive to go hunt down every package that's
      referencing the unquarantined files, so perhaps we should just punt
      on this and add it to 'pkgrepo cleanup' later.  (The assumption
      being that 'pkgrepo cleanup' would remove orphaned content.)


Yes, files can be orphaned. My thoughts were that since we've
now quarantined all broken files, we're very likely to run a
pkgrecv/pkgsend job to repopulate the repository, so those files
wouldn't be orphans for long.


    line 3536: 'given packages' ?  Seems like it should be:
       Verifies that repository contents matches expected state for all
or specified publishers.


Yes, that's better. I had been seeing whether we'd care about
verifying only particular packages, but decided it probably wasn't worth
the trouble given the way I'm iterating over the raw repository
contents.

    line 3577: Why re-raise UnqualifiedFMRIError here?  That doesn't seem
right.  I think the normal exception should be sufficient.


Yep, no idea what I was doing there, sorry. (I wonder was I
thinking about verify/fixing specific packages?)

----------------------------------------
src/pkgrepo.py:
    lines 158-160: These seem incomplete

    line 1240: s/verify/verification/

    line 1247: drop the 'else:' de-indent the following line

    line 1252: s/index,/index/ ?

    line 1249, 1323: two newlines between functions
----------------------------------------
src/tests/cli/t_pkgrepo.py:
    line 1748: s/0755/misc.PKG_DIR_MODE/

    line 1750: s/0644/misc.PKG_FILE_MODE/



Thanks, I've fixed all of these.

----------------------------------------
src/tests/pkg5unittest.py:
    lines 2129-2136, 2552-2554: The other way to do this is as follows:

     def get_su_wrap_user(uid_gid=False):
       import operator
       for u in ["noaccess", "nobody"]:
         try:
           pw = pwd.getpwnam(u)
           if uid_gid:
             return operator.attrgetter('pw_uid', 'pw_gid')(pw)
           return u
         except (KeyError, NameError):
           pass
         raise RuntimeError("Unable to determine user for su.")
     ...
     if su_wrap:
       os.chown(f_path, *get_su_wrap_user())

     That's mildly improved since you get a tuple of return values.


Oh yes, much nicer - thanks.

        cheers,
                        tim

_______________________________________________
pkg-discuss mailing list
pkg-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] Code review: pkgrepo verify and fix

Reply via email to