Re: [pkg-discuss] Code Review request: 1949, 983, 2122

Brock Pytlik Fri, 18 Jul 2008 14:54:30 -0700

As with Shawn's comments, snips mean "Ok, I've changed it"

> Is the synopsis of buid 2122 truncated?
>   
Probably, I'm open to suggestions on how to prevent this from happening. 
It's a single line when I submit it.
> I don't know where the convention of putting a space at the beginning and
> end of every docstring began, but I'd like for it to end now.  No need to
> fix up existing usage, but don't introduce any new ones.
>
> I think we also probably settled on not putting spaces around keyword
> arguments.  At least be consistent in your own patch.  :)
>
> As discussed on Friday, I screwed the pooch on the distinction between _foo
> and __foo.  Please use __foo for symbols that really ought to be private to
> a class, and _foo for ones where they should be thought of as private, but
> where some outsiders actually need to access them.  Though if you want to
> drop the underscores entirely in some places, that's fine, too.  Also feel
> free to use properties where that makes sense.
>
> I think we need a text file in the doc directory explaining how all this
> works.
>   
How what works?
> attributes.py:
>
>   - I'm a bit confused by the way generate_indices() works here.  All the
>     dictionaries returned seem to have keys derived from the data, not
>     describing what kind of thing they are (a la "path", "basename", or
>     "hash" from the file action).  I would expect that a search hit on an
>     fmri would come back as
>
>         INDEX           ACTION   VALUE                  PACKAGE
>         fmri            set      pkg:/[EMAIL PROTECTED]           pkg:/[EMAIL 
> PROTECTED]
>
>     not as
>
>         INDEX           ACTION   VALUE                  PACKAGE
>         pkg:/[EMAIL PROTECTED]    set      ["foo", "1.2", None]   pkg:/[EMAIL 
> PROTECTED]
>
>     which is what it looks like it'll come back as.  Perhaps I'm just not
>     far enough along to see how this makes sense, but given that you didn't
>     touch any of the other actions' generate_indices() methods, I guess I'm
>     going to remain confused.
>
>   - line 67: I know we don't agree about the parens, but tough.  Lose 'em.
>   
Fine. I think this is wrong but clearly I'm in the minority on this.
>   - line 68, 88: we've been using double-quotes around strings, primarily,
>     except when the internal string has double-quotes itself.  There are a
>     bunch of other places where you have this, particularly in new files.
>
>   - line 69ff:
>
>         return dict(
>             (w, w)
>             for w in self.attrs["value"].split()
>         )
>   
Since you think this is clearer, I'll change it.
>   - line 76: This functionality should be part of the fmri module.  And
>     both this here and the PkgFmri (and Version) constructor should use the
>     same parsing methods, whichever one is faster.  Perhaps you don't need
>     to construct any objects, but that just means we need a single parser
>     that the constructor can use as well as places that don't need the
>     object.
>
> catalog.py:
>
>   - line 193: couldn't this go in the finally clause?
>   
No, because the the lock can't be released until after files have been 
closed and moved.
> client_query_engine.py:
>
>   - How confusing would it be if this were just client/query_engine.py?
>   
See my previous discussion with Shawn. I think it's confusing to have 
multiple files with the same names in the same project, especially when 
you're working with several of them at once. I've changed the name.
> image.py:
>
>   - line 456:
>
>         return [
>             (fmri, self.get_manifest_path(fmri))
>             for fmri in self.gen_installed_pkgs()
>         ]
>
>     Better yet, make it a generator by replacing the brackets with parens,
>     and make the caller listify it if they really need that.
>   
It has to be listified because the user needs a count of the number of 
things the generator will produce. I don't believe generators give that 
info. Changed to do this though. Why is it better to construct a 
generator which will immediately be turned into a list?
>   - line 1162ff:
>
>         Since the index directory will not reliably be updated when the
>         image root is, this should be called prior to using the index
>         directory.
>
>     However, when do we change self.imgdir?  If this is actually necessary,
>     shouldn't we just make imgdir a property, and have the setter modify
>     index_dir, too (which should be more efficient than the other way
>     around)?
>   
I don't know. imgdir is a public variable. In theory, it could be 
changed any time. I view this in general as a bug, but not something to 
be fixed in this wad.
>   - line 1356: perhaps call this rebuild_search_indexes?
>   
I'd rather not. It's a client-side specific method. The server doesn't 
use this path. That's why I specifically put "image" in the name. I can 
change to rebuild_client_indexes if you'd prefer.
> imageplan.py:
>
>   - line 512: I'd put blank lines around this one.
>
>   - line 515: I'd ditch this blank line
>
>   - line 516: I'd put this (indented) after the finally
>   
I think these comments were made against the old webrev.
> repository.py:
>
>   - line 185: Are you sure this doesn't require a protocol rev?
>   
Well, in revision 302 (change set id:a5268194080e) image.py was changed 
to allow the server to return 4 pieces of information and I believe the 
corresponding change was put into client.py at revision 
329(f549eab0d7b7). I didn't change the server to send back four pieces 
of information instead of 2 (though perhaps that's clearer now). If I'm 
packaging the information incorrectly, I'll happily change it to package 
it right.
> server/transaction.py:
>
>   - line 188: This will make search results available for packages that
>     aren't published yet.  Is that your intent?
>
>   
No its not. I tried to find a place in the code which is hit at the end 
of each publishing step, I thought this was the right place since close 
has been called and the server has told (or is about to tell) the client 
they've published successfully. If this isn't the right place, I'll 
happily move it elsewhere.
> search_errors.py:
>
>   - The PartialIndexing exception tells you more explicitly what to do than
>     the others.  You should probably do the same in the others.  I'd also
>     fix up the grammar and punctuation.
The information present is the information which is the same for both 
client and server. Partial indexing has more in common between client 
and server than the others, hence more information is given.
>   - I'm not sure it's good practice to put so much information in the
>     string representation of an exception; I'd let the catcher generate
>     that text.  However, if you're keen on being able to catch all the
>     exceptions in a single handler and print out an appropriate message,
>     then I think what I'd do is to have a class variable with a name common
>     across the classes which holds the text, and catchers can print out
>     that string.  For i18n purposes, that string should have "%s" in it so
>     you can insert e.cause even in a translated message.
>   
Why would you do it that way?
And I'll switch to using %s.
> search_storage.py:
>
>   - line 37: It seems like this function would be a lot simpler if you used
>     a file locking protocol instead.  Also, why do you bail out if some
>     index files are missing, but retry if the versions don't match?  Is
>     there a substantive difference between the two conditions?
>   
If there's a difference, I can't remember what it is.
Perhaps I don't have a good understanding of how to apply file locking. 
I want the following characteristics:
Readers always see consistent versions of all files
Many readers can see the files at the same time
A writer should get immediate priority


I'm open to suggestions, but this does work.
>  
>
>   - line 50: this won't let you distinguish between None and 0 for timeout
>     (where the former could mean no timeout, and the latter timeout
>     immediately)
>   
It's unclear to me why you'd choose a timeout of 0. That (depending on 
the granularity of the clock) could mean you'd never try to open a file. 
In short, don't set a timeout of 0. I'll add that as a comment. I've 
changed it to timeout !=None.
>
>   - line 157: Why aren't you using self._file_handle?
>   
Because the the output file handle isn't always the same as the input 
file handle.
>   - line 168: why this assert?
>   
For debugging? I've removed it
>   - line 170: lose the parens (change the tests, turn and into or).  You
>     can also say stat_info.st_mtime, etc.  If you wanted, you could even
>     say:
>
>         if not (self._mtime, self._size) == \
>             (stat_info.st_mtime, stat_info.st_size):
>   
Changed to:
                if self._mtime != stat_info[stat.ST_MTIME] or \
                    self._size != stat_info[stat.ST_SIZE]
>   - line 243: I'm a bit curious why you didn't just use the normal Python
>     syntax for dictionaries and lists.
>   
I don't understand what you mean. I choose a format that could be easily 
parsed and yet fairly compact.
>   - line 280-283: The first time through, we won't do anything except set
>     old_suffix.  Each time after that, we remove the old file.  Why do we
>     preserve the old file the first time we call shift_file(), but none of
>     the rest?
>
>   
We need to keep the most recent old file around because it will be read 
in again. Previous old files can be blown away because they're no longer 
used. The file two generations back is removed, not the most recent version.
> Danek
>   
> And here's part II ...
>
>   
> search_storage.py:
>
>   - line 299: loe?
>   
list of empties, changed it
>   - line 328, et al: Changing the values to the empty string isn't
>     deleting, exactly.  Is there a reason you're doing that rather than
>     doing "del self._list[foo]"?
>   
Yes, del removes the entry from the list, which isn't the desired 
behavior because it changes the indexes in the rest of the list.
>   - line 403: Is there a structural reason we would end up with a blank
>     line, rather than just removing the line in the first place?
>   
Yes, because it would mess up the indexing.
>   - line 535: the entry count is always zero?
>   
I'll change the names of all "get_entry_count" functions to be something 
like "count_entries_removed_during_partial_indexing" since this seems to 
be the source of confusion.
> server/server_catalog.py:
>
>   - line 152: you're not actually following your own advice, since you're
>     also calling it from depot.py.  Of course, you have to, but you might
>     want to rewrite the comment a bit.
>   
Well, you should only get to that line in depot.py by going through the 
refresh index.
Changed to:
                Note: Only one instance of this method should be running.
                External locking is expected to ensure this behavior.
>   - line 157: there's a fair amount of duplication between this method and
>     refresh_index().  Is there any way to reduce that?
>   
Not in any meaningful way that I see, but I'm open to suggestions.
>   - line 178: pkg.catalog.Catalog.build_catalog() isn't used anywhere but
>     here -- you might as well just pull that method into this class.
>   
It's used in the constructor of Catalog. Are you saying that should be 
pulled into the server_catalog too? Ie, the client never needs to use 
build_catalog?
> server_query_engine.py:
>
>   - line 57, 78: why not close the file handle after this line, rather than
>     in the finally clause?
>   
Because if 57 or 78 throws an exception, the file handle will be left 
open. Having one general, fail safe place to close the file was simpler 
than closing it immediately, then also trying to close it again later. I 
couldn't see the harm in leaving them open for a short time.
> t_search.py:
>
>   - I may be missing it, but I don't see a test for bug 983, which would be
>     useful to have.
>   
I've added one.

I'm going to fix whatever I've broken and post a (versioned) webrev.

Thanks for taking a look Danek.

Brock

_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] Code Review request: 1949, 983, 2122

Reply via email to