Re: [pkg-discuss] Code Review request: 1949, 983, 2122

Shawn Walker Wed, 16 Jul 2008 18:23:33 -0700

Brock Pytlik wrote:
> Hi Shawn, thanks taking a look.
> If I've snipped a suggestion, that means "ok, I've made that change"


If I've snipped a comment, that means "whatever you want to do" :-)

> 
> Comments in line.
> 
> General comment:
> 
>          563 +                fmri_dir_path = os.path.join(self.imgdir, 
> "pkg",
>          564 +                                             
> fmri.get_dir_path())
> 
>    Indentation for line 564 should be four spaces in from the previous
>    line as far as I know.
>     
> That indentation follows the PEP8 standard as far as I can tell. The 
> indentation we're using in general is non-standard. I've changed to 
> match the existing code where you've noted it, but to be clear, this 
> isn't PEP8. Other than the 8 spaces issue, are we moving to PEP8 or 
> continuing to use our own standards? (Another place we're inconsistent 
> with the standard is that the standard says """ to close a block 
> docstring should have its own line and a blank line before it.)

I have no preference one way or another. I wasn't sure what the 
agreement was on the PEP8 thing. I'm happy as long as there is consensus 
on our future direction.

> Shawn Walker wrote:
>> 2008/7/13 Brock Pytlik <[EMAIL PROTECTED]>:
>>  

>>
>> http://cr.opensolaris.org/~bpytlik/ips-search/src/modules/client/client_query_engine.py.html
>>  
>>
>> ==========
>> Why is this called client_query_engine instead of just query_engine
>> given that it's in the client folder?
>>
>>   
> Because I find the naming convention easier to understand and clearer if 
> the module includes whether it's server or client (especially when, 
> server, client, and neither versions all exist) and I can't see any harm 
> in having it there. (Basically, client may be in the import, but if I'm 
> working with the code, and I see Catalog, or QueryEngine, I'm heading to 
> the modules named that, and when there are two modules named X, I tend 
> to go to the general one, which may or not be the one I actually want to 
> look at.)

Be that as it may, given that you call a module whatever you want when 
you import it (i.e. import pkg.client.query_engine as 
client_query_engine) I think it's somewhat redundant to have something 
called client_foo in the client folder. I would prefer you change it, 
solely on the basis of consistency with the rest of what we have; but 
I'll go along with whatever the group consensus is on this particular topic.

>> http://cr.opensolaris.org/~bpytlik/ips-search/src/modules/client/image.py.wdiff.html
>>  
>>
>> ==========.
>>
>>      1174 +                query = query_e.Query(args[0])
>>      1175 +                return qe.search(query)
>>
>> Why assign to query instead of just pass to qe.search?
>>   
> For clarity? I could also do "return 
> query_e.ClientQueryEngine(self.index_dir).search(query_e.Query(args[0]))"

No strong preference as long as it isn't a hot path. If python's guts 
are anything like perl's, I always tried to avoid the extra allocation 
if it was worth it.

>> http://cr.opensolaris.org/~bpytlik/ips-search/src/modules/client/imageplan.py.wdiff.html
>>  
>>
>> ==========
>>
>>       525 +                self.progtrack.actions_set_goal("Index
>> Phase", len(plan_info))
>>
>> Can you use a constant for the phase instead of hard-coding it?
>>   
> I could. I could put a constant INDEX_PHASE = "Index Phase" at the top 
> of the file and then use it here. It's not repeated anywhere so there's 
> no commonality. Is this really preferable over having the string inline? 
> If so I'll change it.

The only thought was whether this information would ever be used elsewhere.

>>       577 +                if force or (time.time() -
>> self.last_print_time) >= 0.05:
>>
>> Hrm. I'm not keen on the hard-coded 0.05. This seems like it
>> introduces a possible timing issue. Could you use a constant here at
>> the least?
>>
>> http://cr.opensolaris.org/~bpytlik/ips-search/src/modules/indexer.py.html
>> ==========
>> Lines 50, 57, 64 and 67 should probably just start with a '#' to be
>> consistent with general style.
>>
>> While your comments about the file format are helpful, I would prefer
>> to see a sample format first followed by a description of the sample.
>> That might allow you to remove a good portion of the text by relying
>> on the example to explain the format.
>>   
> # Here is an example of a line from the main dictionary, it is explained 
> below:
> # %gconf.xml (5,3,65689 => 249,202) (5,3,65690 => 249,202) (5,3,65691 => 
> 249,202)
> # (5,3,65692 => 249,202)
> #
> # The main dictionary has a more complicated format. Each line begins 
> with a
> # search token (%gconf.xml) followed by a list of mappings. Each mapping 
> takes
> # a token_type, action, and keyvalue tuple ((5,3,65689), (5,3,65690),
> # (5,3,65691), (5,3,65692)) to a list of pkg-stem, version pairs 
> (249,202) in
> # which the token is found in an action with token_type, action, and 
> keyvalues
> # matching the tuple. Further compaction is gained by storing everything 
> but
> # the token as an id which the other dictionaries can turn into 
> human-readable
> # content.
> 
> Here's what I changed it to. Is that what you had in mind?

That's rather helpful. Thanks.

>>  109                 self._max_mem_use =
>> float(os.environ.get("PKG_INDEX_MAX_RAM",
>>
>> You refer to this as max_mem everywhere in the code, so it would make
>> more sense for the environment variable to be PKG_INDEX_MAX_MEM as
>> well for consistency.
>>   
> Actually, I think it makes more sense to make the change in the other 
> direction. I've changed mem to ram.

That's fine -- I just prefer consistency :-)

>>  146                 self._tmp_dir = os.path.join(self._index_dir, "TMP")
>>
>> Personal preference: why uppercase for the directory name?
>>   
> Because it makes it stand out more, makes it clear it's a temporary 
> directory. I prefer it over tmp.
>>  168                             "Reading Existing Index", 
>> len(self._data_dict))
>>
>> I would like to see constants used for goals.
>>   
> See my above question, why?

Harder to typo and they are re-usable if needed elsewhere. But if 
they're "one hit wonders" leave them be.

>>  256                 """ Estimates RAM used based on size of added and
>>  257                 removed dictionaries. It returns an estimate in KB.
>>  258                 """
>>
>> Nitpick: s/RAM/memory/ for consistency.
>>
>> Nitpick: s/estimate in KB/estimated size in KiB/ :-)
>>   
> Just curious, why KiB? I haven't seen that used.

http://en.wikipedia.org/wiki/IEEE_1541

"After a trial period of two years, in 2005 IEEE 1541-2002 was elevated 
to a full-use standard by the IEEE Standards Association, and as of 
2007, it was scheduled for maintenance."

In short: to address the fact that manufacturers, consumers and others 
often disagreed on the interpretation of numbers the IEEE and other 
organisations decided enough was enough and standardised on a particular 
system for unit of measure.

>>  259                 return 0.5892 * dict_size +  -0.12295 * ids + \
>>  260                     -0.009595 * total_terms + 23512
>>
>> It would be nice to see these numbers as constants. On a side note, I
>> know that these numbers worked, but "magic values" always bother me...
>>   
> I'm going to leave these as is, b/c this is really a hack that should 
> die. I think leaving them in the code encourages someone to think very 
> carefully about what they're doing when they change them instead of 
> viewing them as constants to be tweaked. In other words, these ARE magic 
> values. To me, they're qualitatively different than the TIMOUT_SECS (for 
> example) and promoting them to constants at the top of the file gives 
> them a status they shouldn't have. It would also mean, other code could 
> potentially get at these numbers, something else I think we want to 
> avoid without VERY careful consideration. On a side note, I continue to 
> be open to suggestions on how to do this in Python 2.4 better. IIRC, Dan 
> mentioned that Python 2.6 (I think) has a way to determine the size of a 
> structure.

Can you add an appropriate "deep voodoo" comment above them then?

>> Is there a way to tie these numbers to the current version of the
>> index format such that they will become invalidated the next time we
>> change it? (In other words, force us to either review or update these
>> numbers when we change the format.)
>>   
> I'm open to suggestions. And, actually, it's not dependent on the index 
> format. It's dependent on the structures used to hold that information 
> in memory, so tying it to the index format (which to me means on disk 
> format) isn't the right thing to do.

I guess my assumption was that if our index format changed our 
structured had, but I realise now that that wouldn't necessarily be true.

>>  622                                 raise RuntimeError("Got unknown
>> input_type: " +
>>  623                                     str(input_type))
>>
>> Use %s instead?
>>   
> Changed but why? I find %s harder to read than + str(bleh)

I primarily stick with it for two reasons:

1) Trying to concatenate a string with a non-string raises an exception.

2) When concatenating multiple value it is easier to read.

In short, it's mostly personal preference. Most of the code I've seen in 
pkg thus far has used it instead of "+". Do what you will.

>>  676                         except Exception, e:
>>
>> You don't use the e here.
>>
>>  676                         except Exception, e:
>>  677                                 res = False
>> ...
>>  681                 if res == 0:
>>  682                         res = True
>>  683                 return res
>>
>> There's a possible bug here. If an exception happened and res was set
>> to False, line 681 will cause it to be set to True. Python considers 0
>> and False to be equivalent.
>>
>> However, you could do this instead which is shorter anyway and avoids
>> that possible issue (assuming you didn't intend that):
>> return res and True or False
>>
>>   
> Changed to this:
>                try:
>                        try:
>                                res = \
>                                    
> ss.consistent_open(self._data_dict.values(),
>                                        self._index_dir)
>                        except Exception:
>                                return False
>                finally:
>                        for d in self._data_dict.values():
>                                d.close_file_handle()
>                if res == 0:
>                        res = True
>                return res

Okay, I guess. The whole res == 0 still seems odd to me since if res is 
False or 0 you will set it to True.

>>    6         def __init__(self, cause):
>>    7                 self.cause = cause
>>
>> "Python in a Nutshell: A Desk Reference" recommends the following
>> syntax instead:
>>
>>        def __init__(self, *args):
>>                exceptions.Exception.__init__(self, *args)
>>
>> In addition, a discussion from last year indicates that the Python
>> developers (Gudio, et al.) believe that a recommendation to not rely
>> on self.args is appropriate. Instead, explicitly named arguments are
>> the preference [1].
>>
>> In that sense, what you have is good by using "cause", etc. but would
>> be better as a named parameter (e.g. cause = None?).
>>   
> Sorry, you've lost me here. This exception is constructed using exactly 
> one argument. Why would I use *args and call the init of the parent 
> class? I followed the example given in the python tutorial here: 
> http://docs.python.org/tut/node10.html#SECTION0010500000000000000000

You're supposed to call the init of the parent class to ensure that the 
Base exception class do whatever it would do normally. It's future 
proofing more than anything. It doesn't matter much here I suppose since 
the only thing it does is setup self.args.

It's more about whether you want to have your exceptions match standard 
exception behaviour.


>> http://cr.opensolaris.org/~bpytlik/ips-search/src/modules/search_storage.py.html
>>  
>>
>> ==========
>>
>>  261         def get_entry_count(self):
>>  262                 """ Returns the number of entities stored. """
>>  263                 return 0
>>
>> Maybe this should RaiseError instead of returning 0 to ensure derived
>> classes provide this?
>>   
> I have several derived classes which use this to commonly return 0. If 
> you'd prefer I raise an exception here and duplicate the code in a 
> couple of places, I will.

No, I just wasn't sure of the intent. That's fine.

>> http://cr.opensolaris.org/~bpytlik/ips-search/src/modules/server/repository.py.wdiff.html
>>  
>>
>> ==========
>>       185 +                        output += ("%s %s %s %s\n" % (l[0],
>> l[1], l[2], l[3]))
>>
>> Why not something like:
>>
>> output += "%s\n" % ' '.join(l) ?
>>
>> Is it safe to print all of the elements of l?
>>   
> I think it's clearer this way. It shows that exactly 4 things are being 
> printed (which is what the client side code is expecting). Also, it 
> makes clear that the things in l are ordered and that the order is 
> important for the client.

Actually that isn't clear to me. Can you add a comment?

>>  141                                 if not self._search_available:
>>  142                                         cherrypy.log("Search 
>> Available",
>>  143                                             "INDEX")
>>  144                                 self._search_available = True
>>
>> This is a little confusing to me. Can you add a comment explaining why
>> you're logging "Search Available" when it isn't and why you
>> immediately force it to true either way?
>>
>>   
> It is available. Setup makes it available. We're here if there's nothing 
> new to index.
> I've added this comment.
>                                # Since there is nothing to index, setup
>                                # the index and declare search available.
>                                # We only log this if this represents
>                                # a change in status of the server.

Better.

>>  197                 if (not self.searchdb_update_handle) or \
>>  198                     (self.searchdb_update_handle.poll() == None):
>>
>> I don't think you need the parentheses.
>>   
> I find it much clearer with the parens. I'd rather not require a person 
> reading the code to carry around the entire order of operations for 
> logic functions to accurately understand the code. If you feel strongly, 
> I'll use temporary variables and rewrite the if statement, or do 
> something else similar.

Mainly a thing of consistency. From what I can tell, we don't use 
parentheses in the pkg code unless necessary.

It's a matter of personal preference, so I won't say one way or another.

My only comment is on the consistency.

>>
>>       188 +                self.cfg.catalog.refresh_index()
>>
>> I'm somewhat concerned about the effect on a distro import using
>> solaris.py or any other bulk pkgsend operations that this is going to
>> have.
>>
>> I think we would need the ability to defer any index rebuild until a
>> depot restart or something similar.
>>   
> I'm open to suggestions on where to put this, but I don't understand the 
> issue really. Is your concern that this will slow down a distro import, 
> or that it will break a distro import? Why would you want to defer an 
> index rebuild when you can asynchronously update the index while waiting 
> for more data to come through the pipe?

My concern is the slow down and the additional function call overhead. 
The desire to rebuild the index is good. The place where it's being 
called is what I have a problem with.

>> http://cr.opensolaris.org/~bpytlik/ips-search/src/tests/cli/t_search.py.html 
>>
>> ==========
>>
>>  160                         f.write(p)
>>
>> Possibly include a newline along with the pathname?
>>   
> Sure, why?

Since you wrote the name of the file into the file, it would seem 
logical that it be isolated from the rest of the data if you were to 
view the file. However, I have no strong preference here.

> Thanks again for taking a look Shawn.
> I'll be posting an updated webrev in a little bit once I verify I 
> haven't broken anything and synced up with the gate.

Remember to post it as a *separate* webrev so you don't overwrite the 
old one.

Don't be offended by my style nitpicks. Most of them are not my personal 
preference, they're just comments based on the existing code and being 
consistent with it.

I've worked in projects that have one-space indentation to 8-space tab 
indentation. It doesn't really matter to me. Okay, I draw the line at 
GNU-style braces :-)

Thanks,
-- 
Shawn Walker
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] Code Review request: 1949, 983, 2122

Reply via email to