On Fri, Jul 18, 2008 at 02:54:02PM -0700, Brock Pytlik wrote:

>> I think we need a text file in the doc directory explaining how all this
>> works.
>
> How what works?

Search.  I don't know a thing about it, and so am at a disadvantage when it
comes to reviewing for anything but correctness.  Perhaps everyone else has
experience in this arena, but I don't.  And it doesn't seem to me that it's
a problem that lends itself as easily to purely common-sense solutions as
does much of the rest of IPS.

>> image.py:
>>
>>   - line 456:
>>
>>         return [
>>             (fmri, self.get_manifest_path(fmri))
>>             for fmri in self.gen_installed_pkgs()
>>         ]
>>
>>     Better yet, make it a generator by replacing the brackets with parens,
>>     and make the caller listify it if they really need that.
>>   
> It has to be listified because the user needs a count of the number of 
> things the generator will produce. I don't believe generators give that 
> info. Changed to do this though. Why is it better to construct a generator 
> which will immediately be turned into a list?

It isn't.  It's only useful if the things that call it can use it as a
generator.  If everything needs it in list form, then just do it that way
to begin with.  I hadn't noticed that this was required.

>>   - line 1162ff:
>>
>>         Since the index directory will not reliably be updated when the
>>         image root is, this should be called prior to using the index
>>         directory.
>>
>>     However, when do we change self.imgdir?  If this is actually
>>     necessary, shouldn't we just make imgdir a property, and have the
>>     setter modify index_dir, too (which should be more efficient than
>>     the other way around)?
>>   
> I don't know. imgdir is a public variable. In theory, it could be changed 
> any time. I view this in general as a bug, but not something to be fixed in 
> this wad.

Okay.

>>   - line 1356: perhaps call this rebuild_search_indexes?
>
> I'd rather not. It's a client-side specific method.  The server doesn't
> use this path. That's why I specifically put "image" in the name. 

It's in a client-side specific class, isn't it?  We wouldn't (and don't)
put "image" or "client" in the name of every method called only by the
client, except possibly in something like catalog.py, which is shared
between the client and the server.

Am I missing something?

> I can change to rebuild_client_indexes if you'd prefer.

The point is to get "search" into the name.

>> imageplan.py:
>>
>>   - line 512: I'd put blank lines around this one.
>>
>>   - line 515: I'd ditch this blank line
>>
>>   - line 516: I'd put this (indented) after the finally
>>   
> I think these comments were made against the old webrev.

Likely.

>> repository.py:
>>
>>   - line 185: Are you sure this doesn't require a protocol rev?
>>   
> Well, in revision 302 (change set id:a5268194080e) image.py was changed to 
> allow the server to return 4 pieces of information and I believe the 
> corresponding change was put into client.py at revision 329(f549eab0d7b7). 
> I didn't change the server to send back four pieces of information instead 
> of 2 (though perhaps that's clearer now). If I'm packaging the information 
> incorrectly, I'll happily change it to package it right.

Huh.  I'm confused, then.  Servers are (and have been for some time)
returning four-column data.  So how is that happening if this is writing
out four pieces of data?  Or is l[1] in the current code just a three-piece
string?

>> server/transaction.py:
>>
>>   - line 188: This will make search results available for packages that
>>     aren't published yet.  Is that your intent?
>
> No its not. I tried to find a place in the code which is hit at the end of 
> each publishing step, I thought this was the right place since close has 
> been called and the server has told (or is about to tell) the client 
> they've published successfully. If this isn't the right place, I'll happily 
> move it elsewhere.

I guess accept_publish() is the right place for it.  I'm surprised it
wasn't there to begin with.  My fault, I suppose.  :)

>> search_errors.py:
>>
>>   - The PartialIndexing exception tells you more explicitly what to do
>>     than the others.  You should probably do the same in the others.
>>     I'd also fix up the grammar and punctuation.
> The information present is the information which is the same for both 
> client and server. Partial indexing has more in common between client and 
> server than the others, hence more information is given.

You mean that the other errors are server-side only, and thus don't need to
be quite as user-friendly?

>>   - I'm not sure it's good practice to put so much information in the
>>     string representation of an exception; I'd let the catcher generate
>>     that text.  However, if you're keen on being able to catch all the
>>     exceptions in a single handler and print out an appropriate message,
>>     then I think what I'd do is to have a class variable with a name 
>> common
>>     across the classes which holds the text, and catchers can print out
>>     that string.  For i18n purposes, that string should have "%s" in it so
>>     you can insert e.cause even in a translated message.
>>   
> Why would you do it that way?

What, make the catcher generate the text?  Different clients might want to
say something different -- the gui might have a suggestion that's more
appropriate for a gui (or a gui user) than the cli.

>> search_storage.py:
>>
>>   - line 37: It seems like this function would be a lot simpler if you 
>> used
>>     a file locking protocol instead.  Also, why do you bail out if some
>>     index files are missing, but retry if the versions don't match?  Is
>>     there a substantive difference between the two conditions?
>
> If there's a difference, I can't remember what it is.

Ah, long-term development.  :)  If there's no good reason for the
difference, then eliminate it.

> Perhaps I don't have a good understanding of how to apply file locking. I 
> want the following characteristics:
> Readers always see consistent versions of all files
> Many readers can see the files at the same time
> A writer should get immediate priority

I think K can help here -- he seemed eager to help out on locking problems.  :)

>>   - line 50: this won't let you distinguish between None and 0 for timeout
>>     (where the former could mean no timeout, and the latter timeout
>>     immediately)
>
> It's unclear to me why you'd choose a timeout of 0. That (depending on the 
> granularity of the clock) could mean you'd never try to open a file.

Well, you could put the test at the end of the loop instead, allowing for
at least one attempt.  The idea would be precisely that -- try once, and if
it fails, don't try again.

>>   - line 170: lose the parens (change the tests, turn and into or).  You
>>     can also say stat_info.st_mtime, etc.  If you wanted, you could even
>>     say:
>>
>>         if not (self._mtime, self._size) == \
>>             (stat_info.st_mtime, stat_info.st_size):
>>   
> Changed to:
>                if self._mtime != stat_info[stat.ST_MTIME] or \
>                    self._size != stat_info[stat.ST_SIZE]

Why "stat_info[stat.ST_MTIME]" over "stat_info.st_mtime"?

>>   - line 243: I'm a bit curious why you didn't just use the normal Python
>>     syntax for dictionaries and lists.
>>   
> I don't understand what you mean. I choose a format that could be easily 
> parsed and yet fairly compact.

I mean that the format you chose is almost like what Python uses to print
out dictionaries and lists, but not quite.  It uses ":" to designate a
mapping between key and value in a dictionary, rather than "->", for
instance.  The upside of using the Python syntax is that you could print
out a dictionary without having to write any code to do so -- just "print
mydict" (or "pprint.pprint(mydict)", if you wanted something a bit easier
for a human to read).  Parsing could simply be a call to eval(), if that
were sufficiently safe, or some of the Python parsing routines otherwise.

If you don't think that'd be easier, no worries.  It's just a bit weird.

>>   - line 280-283: The first time through, we won't do anything except set
>>     old_suffix.  Each time after that, we remove the old file.  Why do we
>>     preserve the old file the first time we call shift_file(), but none of
>>     the rest?
>
> We need to keep the most recent old file around because it will be read in 
> again. Previous old files can be blown away because they're no longer used. 
> The file two generations back is removed, not the most recent version.

Right.  Gotcha.

>>   search_storage.py:
>>
>>   - line 299: loe?
>>   
> list of empties, changed it

If "loe" is useful because of it's length, that's fine.  A comment would be
sufficient.

>>   - line 328, et al: Changing the values to the empty string isn't
>>     deleting, exactly.  Is there a reason you're doing that rather than
>>     doing "del self._list[foo]"?
>>   
> Yes, del removes the entry from the list, which isn't the desired behavior 
> because it changes the indexes in the rest of the list.

Okay, just making sure.

>>   - line 535: the entry count is always zero?
>>   
> I'll change the names of all "get_entry_count" functions to be something 
> like "count_entries_removed_during_partial_indexing" since this seems to be 
> the source of confusion.

I think that changing the docstring would likely be sufficient.

>> server/server_catalog.py:
>>
>>   - line 152: you're not actually following your own advice, since you're
>>     also calling it from depot.py.  Of course, you have to, but you might
>>     want to rewrite the comment a bit.
>>   
> Well, you should only get to that line in depot.py by going through the 
> refresh index.
> Changed to:
>                Note: Only one instance of this method should be running.
>                External locking is expected to ensure this behavior.

You might continue to suggest that calling refresh_index() is preferred.

>>   - line 178: pkg.catalog.Catalog.build_catalog() isn't used anywhere but
>>     here -- you might as well just pull that method into this class.
>>   
> It's used in the constructor of Catalog. Are you saying that should be 
> pulled into the server_catalog too? Ie, the client never needs to use 
> build_catalog?

I missed that, sorry.

>> server_query_engine.py:
>>
>>   - line 57, 78: why not close the file handle after this line, rather
>>     than in the finally clause?
>>   
> Because if 57 or 78 throws an exception, the file handle will be left open. 
> Having one general, fail safe place to close the file was simpler than 
> closing it immediately, then also trying to close it again later. I 
> couldn't see the harm in leaving them open for a short time.

Okay.

Thanks,
Danek
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to