Re: [pkg-discuss] [REVIEW] local publishing / pkgsend revamp

Shawn Walker Tue, 10 Feb 2009 10:18:25 -0800

Danek Duvall wrote:

On Mon, Feb 09, 2009 at 03:33:29AM -0600, Shawn Walker wrote:

http://cr.opensolaris.org/~swalker/pkg-2691/


Any chance you can cook up an auxiliary webrev that does diffs between the
old repository.py and the new depot.py?  It's pretty hard to review these
changes as-is.  I haven't reviewed it yet; if you can't or don't want to do
this, say so, and I'll figure it out myself.


Sorry about that:
http://cr.opensolaris.org/~swalker/pkg-2691/repo-depot-diff.txt

I've removed any thing I've addressed from your reply.

depot.py:

...

action/__init__.py:

  - line 106: why the change from position to position+1?

Because I didn't believe users are going to realise that the position iszero-indexed; however, I'm willing to change this back if you think I'mbeing too cautious.

  - line 158: perhaps "... and action.attrs.get(ka, None) is None"?  I'd
    speed test both, to see.


Was consistently a few hundred actions/sec. slower.

action/generic.py:

  - line 36: you might use cStringIO instead; it should be faster, and
    maybe more memory efficient.  Though I see at least in one place, you
    changed it from cStringIO, so perhaps you have a reason?

I did it because of the recent rumblings about not using C-based pythonstuff. However, I've changed this back to cStringIO.


  - line 122: why is this a problem?

It causes the process to hang if pkg.size isn't provided. Call this"slap the hand of people trying to break my stuff on purpose."

  - line 155, 167: these clauses can be pretty bad on the memory usage.


Sorry, I'm dense at the moment.  Explain a bit more?

    Where do we end up using them?  It might help to do something like

The only time this gets hit at the moment is during publishing to aremote server. The receiving server, when it rebuilds the action, handsoff a request rfile object to the action class which only supportsread(). I was trying to be efficient and allow objects that supportseek() and tell() to get their size calculated in a better way thansucking in all the data from a file.

Technically, I think I only need lines 164-166 if the object supports"read" and I could dump the rest for our current needs.

        data = StringIO.StringIO(data.read())
        data.seek(0, 2)
        sz = data.tell()
        data.seek(0)

  - line 160: you might take a look at the suggestion in

        http://mail.python.org/pipermail/python-list/2006-March/375280.html

    to get SEEK_SET, etc.


Why do I need SEEK_SET again?

action/license.py:

  - line 108: why are you doing this?  Ask for forgiveness, etc.  I'd say

Apparently a nun hit my head instead of my hand with a ruler, so I'm notsure I follow completely, but...

I'm going to assume that you meant I should catch an ENOENT instead ofchecking for os.path.exists here. So changed.

    the same about the size calculations in generic.py, but I see how that
    would get very indented (I'd still suggest you do it that way, though;
    perhaps splitting clauses out into separate functions).

I don't follow the comment about size calculations, unless the number ofstatements within a try/except is the concern.

publish/transaction.py:

  - line 114: Why is this necessary?  Can't you just use __repo_cache?


NameError: global name '_TransactionFile__repo_cache' is not defined

  - line 130: are there any issues with multiple simultaneous publishers?
    (Not necessarily just here, but in general.)

As long as they're all using the same repository object, no. Otherwise,probably. I suspect you would encounter the same problem as you wouldwhen running multiple pkg.depotd processes and publishing to them.


I was assuming bug 2696 would address this indirectly.

  - line 386: could you be overwriting a msg that the server provided?

Shouldn't be, as the server should have returned a 4xx/5xx errortriggering the previous clause. However, it could be overwriting amessage from urllib2 or httplib. I've changed this to include theoriginal value of 'msg'.

server/catalog.py:

  - line 141: os.path.basename(cmd) in ("pkg.depotd", "depot.py")?  But
    perhaps this should be triggered by some global variable the depot
    sets (environment variable, maybe?), instead of the command name?

I've added a fork_allowed parameter to SvrConfig which it passes on toserver/catalog.__init__, which is then checked.

server/config.py:

  - line 54: I'd say that if you have to tell it whether or not to create a
    repo, then it isn't really "auto".  "create_repo", maybe?

Actually, it's correct. You're not telling it whether to create one,you're telling if you want it to automatically create one if one doesnot already exist at the specified location. Only if auto_create istrue, and the repository directory is empty or does not exist, will arepository be created.

  - line 102: we get here if the makedirs() up on line 97 failed, right?


Yes.

    But wouldn't an exception have been raised at that point?  You can also

Not if the directory was somehow removed immediately after it wassuccessfully created; though I know that's unlikely.

    do

        if os.path.exists(d):
            if self.auto_create:
                raise ...
            raise ...


So changed, except *not* os.path.exists ;-)

server/transaction.py:

  - line 117: what needs to be done in init?  And why can't it?

cfg should be assigned in __init__; I don't want to add the extra churnnecessary to make it happen at this moment. There is a lot of cleanupwork left to do with making server/config sane and moving more of thedepot logic into depot.py.

  - line 212: previously, we had a bunch of stub code here that would deal
    with package completeness.  Shouldn't that stay?

Why? I'm not a big fan of stub code; especially stub code that wasgetting called and doing nothing. If you'd like, I'd be happy to open abug with a summary of the stubs I removed.

  - line 216: what happens if this fails?  Previously we'd log it and drive
    on; now it throws an exception and the fact that the actual transaction
    succeeded is lost.

I was only considering the logging of the exception, and hadn't realisedthat it was allowing a Transaction success to go through. The mainproblem here is that I can't rely on cherrypy to be present, nor do Ihave any desire to reference a transport specific module in unrelatedcode. That and it was horribly wrong in catching all exceptions.

The best alternative I have for the moment is to catch EnvironmentErrorand print to stderr using misc.emsg(). We need a logging framework orsome way for the modules to communicate information like this.

  - line 276: I know it was this way before, but how does this raise a
    KeyError?  I could see how line 273 could.


No idea.  Changed.

  - line 357: I'm not sure why the comment that was previously here about
    multiple packages moving to PUBLISHED at the same time was removed.


Overzealous cleaning; added back.

server/repository.py:
  - line 96: should this have a line break in here somewhere?

My opinion was that I don't know what client is using my messages, so Ishould be using a minimal amount of formatting and relying on the clientto format messages as they see fit (think of the GUI wrapping the textto fit its box according to font size for visually impaired users).

  - line 146: again, why the test?  Just try to create it.  If it fails,
    create one from scratch.

There already; but in this case, an EnvironmentError/ENOENT won't bethrown as we're using SafeConfigParser(), so I've changed this a littlebit different accordingly.

  - line 214: would "if isinstance(last_modified, basestr):" be better?


basestring, but yes.

  - line 261: probably should have a blank line before it.  But doe these
    calls belong in repository?  Or should they go in depot?  I have the
    same feeling about the search-related stuff (well, maybe indexing makes
    sense here).

I believe the inc_*() calls belong here for now until I can sort out theserver/config.py mess.

I'm positive that the search related stuff should stay here as that is aset of functions that operates on repository data and I intendedconsumers of the repository API to have access to it.


Thanks,
--
Shawn Walker
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] [REVIEW] local publishing / pkgsend revamp

Reply via email to