Re: [pkg-discuss] [REVIEW] local publishing / pkgsend revamp

Shawn Walker Tue, 10 Feb 2009 09:27:52 -0800

[re-send due to mail outage]

Danek Duvall wrote:

On Mon, Feb 09, 2009 at 03:33:29AM -0600, Shawn Walker wrote:

http://cr.opensolaris.org/~swalker/pkg-2691/


Any chance you can cook up an auxiliary webrev that does diffs between the
old repository.py and the new depot.py?  It's pretty hard to review these
changes as-is.  I haven't reviewed it yet; if you can't or don't want to do
this, say so, and I'll figure it out myself.


Sorry about that:
http://cr.opensolaris.org/~swalker/pkg-2691/repo-depot-diff.txt

I've removed any thing I've addressed from your reply.

depot.py:

...

action/__init__.py:

  - line 106: why the change from position to position+1?


Because I didn't believe users are going to realise that the position is
zero-indexed; however, I'm willing to change this back if you think I'm
being too cautious.

  - line 158: perhaps "... and action.attrs.get(ka, None) is None"?  I'd
    speed test both, to see.


Was consistently a few hundred actions/sec. slower.

action/generic.py:

  - line 36: you might use cStringIO instead; it should be faster, and
    maybe more memory efficient.  Though I see at least in one place, you
    changed it from cStringIO, so perhaps you have a reason?


I did it because of the recent rumblings about not using C-based python
stuff.  However, I've changed this back to cStringIO.


  - line 122: why is this a problem?


It causes the process to hang if pkg.size isn't provided.  Call this
"slap the hand of people trying to break my stuff on purpose."

  - line 155, 167: these clauses can be pretty bad on the memory usage.


Sorry, I'm dense at the moment.  Explain a bit more?

    Where do we end up using them?  It might help to do something like


The only time this gets hit at the moment is during publishing to a
remote server.  The receiving server, when it rebuilds the action, hands
off a request rfile object to the action class which only supports
read().  I was trying to be efficient and allow objects that support
seek() and tell() to get their size calculated in a better way than
sucking in all the data from a file.

Technically, I think I only need lines 164-166 if the object supports
"read" and I could dump the rest for our current needs.

        data = StringIO.StringIO(data.read())
        data.seek(0, 2)
        sz = data.tell()
        data.seek(0)

  - line 160: you might take a look at the suggestion in

        http://mail.python.org/pipermail/python-list/2006-March/375280.html

    to get SEEK_SET, etc.


Why do I need SEEK_SET again?

action/license.py:

  - line 108: why are you doing this?  Ask for forgiveness, etc.  I'd say


Apparently a nun hit my head instead of my hand with a ruler, so I'm not
sure I follow completely, but...

I'm going to assume that you meant I should catch an ENOENT instead of
checking for os.path.exists here.  So changed.

    the same about the size calculations in generic.py, but I see how that
    would get very indented (I'd still suggest you do it that way, though;
    perhaps splitting clauses out into separate functions).


I don't follow the comment about size calculations, unless the number of
statements within a try/except is the concern.

publish/transaction.py:

  - line 114: Why is this necessary?  Can't you just use __repo_cache?


NameError: global name '_TransactionFile__repo_cache' is not defined

  - line 130: are there any issues with multiple simultaneous publishers?
    (Not necessarily just here, but in general.)


As long as they're all using the same repository object, no.  Otherwise,
probably.  I suspect you would encounter the same problem as you would
when running multiple pkg.depotd processes and publishing to them.

I was assuming bug 2696 would address this indirectly.

  - line 386: could you be overwriting a msg that the server provided?


Shouldn't be, as the server should have returned a 4xx/5xx error
triggering the previous clause.  However, it could be overwriting a
message from urllib2 or httplib.  I've changed this to include the
original value of 'msg'.

server/catalog.py:

  - line 141: os.path.basename(cmd) in ("pkg.depotd", "depot.py")?  But
    perhaps this should be triggered by some global variable the depot
    sets (environment variable, maybe?), instead of the command name?


I've added a fork_allowed parameter to SvrConfig which it passes on to
server/catalog.__init__, which is then checked.

server/config.py:

  - line 54: I'd say that if you have to tell it whether or not to create a
    repo, then it isn't really "auto".  "create_repo", maybe?


Actually, it's correct.  You're not telling it whether to create one,
you're telling if you want it to automatically create one if one does
not already exist at the specified location.  Only if auto_create is
true, and the repository directory is empty or does not exist, will a
repository be created.

  - line 102: we get here if the makedirs() up on line 97 failed, right?


Yes.

    But wouldn't an exception have been raised at that point?  You can also


Not if the directory was somehow removed immediately after it was
successfully created; though I know that's unlikely.

    do

        if os.path.exists(d):
            if self.auto_create:
                raise ...
            raise ...


So changed, except *not* os.path.exists ;-)

server/transaction.py:

  - line 117: what needs to be done in init?  And why can't it?


cfg should be assigned in __init__; I don't want to add the extra churn
necessary to make it happen at this moment.  There is a lot of cleanup
work left to do with making server/config sane and moving more of the
depot logic into depot.py.

  - line 212: previously, we had a bunch of stub code here that would deal
    with package completeness.  Shouldn't that stay?


Why?  I'm not a big fan of stub code; especially stub code that was
getting called and doing nothing.  If you'd like, I'd be happy to open a
bug with a summary of the stubs I removed.

  - line 216: what happens if this fails?  Previously we'd log it and drive
    on; now it throws an exception and the fact that the actual transaction
    succeeded is lost.


I was only considering the logging of the exception, and hadn't realised
that it was allowing a Transaction success to go through.  The main
problem here is that I can't rely on cherrypy to be present, nor do I
have any desire to reference a transport specific module in unrelated
code.  That and it was horribly wrong in catching all exceptions.

The best alternative I have for the moment is to catch EnvironmentError
and print to stderr using misc.emsg().  We need a logging framework or
some way for the modules to communicate information like this.

  - line 276: I know it was this way before, but how does this raise a
    KeyError?  I could see how line 273 could.


No idea.  Changed.

  - line 357: I'm not sure why the comment that was previously here about
    multiple packages moving to PUBLISHED at the same time was removed.


Overzealous cleaning; added back.

server/repository.py:
  - line 96: should this have a line break in here somewhere?


My opinion was that I don't know what client is using my messages, so I
should be using a minimal amount of formatting and relying on the client
to format messages as they see fit (think of the GUI wrapping the text
to fit its box according to font size for visually impaired users).

  - line 146: again, why the test?  Just try to create it.  If it fails,
    create one from scratch.


There already; but in this case, an EnvironmentError/ENOENT won't be
thrown as we're using SafeConfigParser(), so I've changed this a little
bit different accordingly.

  - line 214: would "if isinstance(last_modified, basestr):" be better?


basestring, but yes.

  - line 261: probably should have a blank line before it.  But doe these
    calls belong in repository?  Or should they go in depot?  I have the
    same feeling about the search-related stuff (well, maybe indexing makes
    sense here).


I believe the inc_*() calls belong here for now until I can sort out the
server/config.py mess.

I'm positive that the search related stuff should stay here as that is a
 set of functions that operates on repository data and I intended
consumers of the repository API to have access to it.

Thanks,
--
Shawn Walker
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] [REVIEW] local publishing / pkgsend revamp

Reply via email to