Re: [pkg-discuss] [REVIEW] local publishing / pkgsend revamp

Danek Duvall Tue, 10 Feb 2009 10:28:16 -0800

On Tue, Feb 10, 2009 at 01:03:43AM -0600, Shawn Walker wrote:

> http://cr.opensolaris.org/~swalker/pkg-2691/repo-depot-diff.txt


Thanks.

> I've removed any thing I've addressed from your reply.
>
>> depot.py:
> ...
>> action/__init__.py:
>>
>>   - line 106: why the change from position to position+1?
>
> Because I didn't believe users are going to realise that the position is 
> zero-indexed; however, I'm willing to change this back if you think I'm 
> being too cautious.

I think so -- end-users are more likely to look at the arrow pointing to
where the problem is, and those folks looking at the number are more likely
to be programmers dealing with zero-indexed arrays and strings, anyway.
I'd rather wait to change this when we have evidence (bugs) that people are
confused by it.

>>   - line 158: perhaps "... and action.attrs.get(ka, None) is None"?  I'd
>>     speed test both, to see.
>
> Was consistently a few hundred actions/sec. slower.

Okay, thanks.

>> action/generic.py:
>>
>>   - line 36: you might use cStringIO instead; it should be faster, and
>>     maybe more memory efficient.  Though I see at least in one place, you
>>     changed it from cStringIO, so perhaps you have a reason?
>
> I did it because of the recent rumblings about not using C-based python 
> stuff.  However, I've changed this back to cStringIO.

I think the C-based Python stuff is more about pkg(5) delivering its own C
modules, rather than relying on the Python standard library.  Most of the
core modules are implemented in C, anyway.

>>   - line 122: why is this a problem?
>
> It causes the process to hang if pkg.size isn't provided.  Call this "slap 
> the hand of people trying to break my stuff on purpose."

How does it hang?  Reading from /dev/null should return EOF immediately.
I've used this quite a bit to put dummy test files into dummy packages.

>>   - line 155, 167: these clauses can be pretty bad on the memory usage.
>
> Sorry, I'm dense at the moment.  Explain a bit more?

You're reading the entire file into memory, which is a technique you've
explicitly gotten rid of in other places, for the better.  So why do we do
it here?

>>     Where do we end up using them?  It might help to do something like
>
> The only time this gets hit at the moment is during publishing to a remote 
> server.  The receiving server, when it rebuilds the action, hands off a 
> request rfile object to the action class which only supports read().  I was 
> trying to be efficient and allow objects that support seek() and tell() to 
> get their size calculated in a better way than sucking in all the data from 
> a file.

That's fine.  I'd been laboring under the mistaken impression that some
streams could only seek forwards, but apparently I'm wrong, so the
seek/tell bit is fine.

> Technically, I think I only need lines 164-166 if the object supports 
> "read" and I could dump the rest for our current needs.

<shrug> What's there is fine.  If we run into more memory issues, we could
always write the data to a temporary file in chunks.

>>         data = StringIO.StringIO(data.read())
>>         data.seek(0, 2)
>>         sz = data.tell()
>>         data.seek(0)
>>
>>   - line 160: you might take a look at the suggestion in
>>
>>         
>> http://mail.python.org/pipermail/python-list/2006-March/375280.html
>>
>>     to get SEEK_SET, etc.
>
> Why do I need SEEK_SET again?

So you're not using magic constants like 2.

>> action/license.py:
>>
>>   - line 108: why are you doing this?  Ask for forgiveness, etc.  I'd say
>
> Apparently a nun hit my head instead of my hand with a ruler, so I'm not 
> sure I follow completely, but...

Search for "easier to ask forgivness".

> I'm going to assume that you meant I should catch an ENOENT instead of 
> checking for os.path.exists here.  So changed.

Yes.  Forgiveness is the general Pythonic way, though if you can determine
that it's a performance problem (in a key area), or otherwise leads to
unmaintainable code, choose permission.

>>     the same about the size calculations in generic.py, but I see how that
>>     would get very indented (I'd still suggest you do it that way, though;
>>     perhaps splitting clauses out into separate functions).
>
> I don't follow the comment about size calculations, unless the number of 
> statements within a try/except is the concern.

You could rewrite the size calculation in generic.py as (roughly):

    try:
        sz = data.size
    except AttributeError:
        try:
            fs = os.fstat(data.fileno())
            sz = fs.st_size
        except (AttributeError, TypeError):
            try:
                try:
                    data.seek(0, 2)
                    sz = data.tell()
                    data.seek(0)
                except (AttributeError, TypeError):
                    d = data.read()
                    sz = len(d)
                    data = cStringIO.StringIO(d)
            except (AttributeError, TypeError):
                sz = len(data)
                data = cStringIO.StringIO(data)

which just tries to do something, and falls back to the next alternate
mechanism if that didn't work.

Again, forgiveness instead of permission.  I've read that it's supposed to
be faster in Python to do this, but it's not clear to me that it is.

>> publish/transaction.py:
>>
>>   - line 114: Why is this necessary?  Can't you just use __repo_cache?
>
> NameError: global name '_TransactionFile__repo_cache' is not defined

Oh, right.  The "__" hides it.

>>   - line 130: are there any issues with multiple simultaneous publishers?
>>     (Not necessarily just here, but in general.)
>
> As long as they're all using the same repository object, no.  Otherwise, 
> probably.  I suspect you would encounter the same problem as you would when 
> running multiple pkg.depotd processes and publishing to them.
>
> I was assuming bug 2696 would address this indirectly.

Okay.

>> server/config.py:
>
>>   - line 102: we get here if the makedirs() up on line 97 failed, right?
>
> Yes.
>
>>     But wouldn't an exception have been raised at that point?  You can 
>> also
>
> Not if the directory was somehow removed immediately after it was 
> successfully created; though I know that's unlikely.

I think we should assume that the directories are still there.  They could
disappear at any time, so protecting against their removal immediately
after creation seems a bit silly.

I'd just create all of the directories if auto_create is set, ignoring any
errors for directories that already exist, and not check existence either
before or afterwards.

>> server/transaction.py:
>>
>>   - line 117: what needs to be done in init?  And why can't it?
>
> cfg should be assigned in __init__; I don't want to add the extra churn
> necessary to make it happen at this moment.  There is a lot of cleanup
> work left to do with making server/config sane and moving more of the
> depot logic into depot.py.

Okay.

>>   - line 212: previously, we had a bunch of stub code here that would deal
>>     with package completeness.  Shouldn't that stay?
>
> Why?  I'm not a big fan of stub code; especially stub code that was getting 
> called and doing nothing.  If you'd like, I'd be happy to open a bug with a 
> summary of the stubs I removed.

A bug would be fine.  I just don't want to lose the requirement to do this.
Stephen probably won't forget, but I think that the rest of us probably
need a reminder stashed somewhere.  I think it's less likely to get lost in
the code, but a bug would probably be okay instead.

>>   - line 216: what happens if this fails?  Previously we'd log it and
>>     drive on; now it throws an exception and the fact that the actual
>>     transaction succeeded is lost.
>
> I was only considering the logging of the exception, and hadn't realised 
> that it was allowing a Transaction success to go through.  The main problem 
> here is that I can't rely on cherrypy to be present, nor do I have any 
> desire to reference a transport specific module in unrelated code.  That 
> and it was horribly wrong in catching all exceptions.
>
> The best alternative I have for the moment is to catch EnvironmentError and 
> print to stderr using misc.emsg().  We need a logging framework or some way 
> for the modules to communicate information like this.

Sure.  Or even just ignore it.  I just want to make sure the operation
succeeds even if rmtree() fails.

>> server/repository.py:
>>   - line 96: should this have a line break in here somewhere?
>
> My opinion was that I don't know what client is using my messages, so I 
> should be using a minimal amount of formatting and relying on the client to 
> format messages as they see fit (think of the GUI wrapping the text to fit 
> its box according to font size for visually impaired users).

Okay.  I keep dreaming about a widget that will display text in the
terminal without having to have explicit linebreaks.  Kind of like
textwrap, but probably a little richer.

>>   - line 261: probably should have a blank line before it.  But doe these
>>     calls belong in repository?  Or should they go in depot?  I have the
>>     same feeling about the search-related stuff (well, maybe indexing
>>     makes sense here).
>
> I believe the inc_*() calls belong here for now until I can sort out the 
> server/config.py mess.

Sure.

> I'm positive that the search related stuff should stay here as that is a  
> set of functions that operates on repository data and I intended consumers 
> of the repository API to have access to it.

I suppose I could be convinced.

On line 333, is the "_0" intentional?

Danek
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] [REVIEW] local publishing / pkgsend revamp

Reply via email to