Sorry, I left these things to come back to after my first pass through
Danek's comments, but then forgot to actually go back through them until
after I'd sent out the previous response.
Danek Duvall wrote:
> On Fri, Jul 18, 2008 at 02:54:02PM -0700, Brock Pytlik wrote:
>
>
>>> I think we need a text file in the doc directory explaining how all this
>>> works.
>>>
>> How what works?
>>
>
> Search. I don't know a thing about it, and so am at a disadvantage when it
> comes to reviewing for anything but correctness. Perhaps everyone else has
> experience in this arena, but I don't. And it doesn't seem to me that it's
> a problem that lends itself as easily to purely common-sense solutions as
> does much of the rest of IPS.
>
>
Ok, I'll start on this once I've gotten the next webrev up.
>>> - line 1356: perhaps call this rebuild_search_indexes?
>>>
>> I'd rather not. It's a client-side specific method. The server doesn't
>> use this path. That's why I specifically put "image" in the name.
>>
>
> It's in a client-side specific class, isn't it? We wouldn't (and don't)
> put "image" or "client" in the name of every method called only by the
> client, except possibly in something like catalog.py, which is shared
> between the client and the server.
>
> Am I missing something?
>
Basically, I view redundant information as a positive when it doesn't
cause problems. In this case, the project has two indexes, with two
different entry points for updating them, so adding the redundant
information that it's the client indexes that are being updated rather
than the servers seemed like a reasonable thing to me. It's what lies
behind my parenthesis view, the naming of other files with client_X
server_X when X is also present. I'd rather provide multiple streams of
information to the user/coder so that they can use whichever is easiest
for them rather than forcing them depend on the import/file path which
might not be easily available or forcing them to carry around python's
order of operations or make them know that image.py is a client only
module. But, I seem to be alone on this, so I'll change when I notice
I've gone down this path or when others point it out to me. I just
wanted to explain my thinking on the topic before I dropped it.
>
>> I can change to rebuild_client_indexes if you'd prefer.
>>
>
> The point is to get "search" into the name.
>
Ah, I thought the objection was to image. Anyway, it's now called
rebuild_search_index
>>> repository.py:
>>>
>>> - line 185: Are you sure this doesn't require a protocol rev?
>>>
>>>
>> Well, in revision 302 (change set id:a5268194080e) image.py was changed to
>> allow the server to return 4 pieces of information and I believe the
>> corresponding change was put into client.py at revision 329(f549eab0d7b7).
>> I didn't change the server to send back four pieces of information instead
>> of 2 (though perhaps that's clearer now). If I'm packaging the information
>> incorrectly, I'll happily change it to package it right.
>>
>
> Huh. I'm confused, then. Servers are (and have been for some time)
> returning four-column data. So how is that happening if this is writing
> out four pieces of data? Or is l[1] in the current code just a three-piece
> string?
>
>
Well, here's the gate's code for parsing the result from the server:
for index, mfmri, action, value in itertools.chain(*searches):
retcode = 0
if first:
if action and value:
msg("%-10s %-9s %-25s %s" % ("INDEX",
"ACTION", "VALUE", "PACKAGE"))
else:
msg("%-10s %s" % ("INDEX", "PACKAGE"))
first = False
if action and value:
msg("%-10s %-9s %-25s %s" % (index, action,
value, fmri.PkgFmri(str(mfmri)
).get_short_fmri()))
else:
msg("%-10s %s" % (index, mfmri))
Frankly, I can't understand the old gate code for storing items into the
index, and I have no idea how it was managing to send back 4 pieces of
information. What I do know is that to me, it looks like the client gate
code is looking for 4 pieces of information in a certain order, and the
new code makes it clear that 4 pieces of information are needed, and
they're needed in a certain order.
>>> server/transaction.py:
>>>
>>> - line 188: This will make search results available for packages that
>>> aren't published yet. Is that your intent?
>>>
>> No its not. I tried to find a place in the code which is hit at the end of
>> each publishing step, I thought this was the right place since close has
>> been called and the server has told (or is about to tell) the client
>> they've published successfully. If this isn't the right place, I'll happily
>> move it elsewhere.
>>
>
> I guess accept_publish() is the right place for it. I'm surprised it
> wasn't there to begin with. My fault, I suppose. :)
>
>
It used to live in publish_package. I agree accept publish seems like
the right place. Indexing is now the last thing it does before sending
the published response to the client.
>>> search_errors.py:
>>>
>>> - The PartialIndexing exception tells you more explicitly what to do
>>> than the others. You should probably do the same in the others.
>>> I'd also fix up the grammar and punctuation.
>>>
>> The information present is the information which is the same for both
>> client and server. Partial indexing has more in common between client and
>> server than the others, hence more information is given.
>>
>
> You mean that the other errors are server-side only, and thus don't need to
> be quite as user-friendly?
>
>
No, I mean the other errors have more client specific information
provided in their handler in client.py.
>>> - I'm not sure it's good practice to put so much information in the
>>> string representation of an exception; I'd let the catcher generate
>>> that text. However, if you're keen on being able to catch all the
>>> exceptions in a single handler and print out an appropriate message,
>>> then I think what I'd do is to have a class variable with a name
>>> common
>>> across the classes which holds the text, and catchers can print out
>>> that string. For i18n purposes, that string should have "%s" in it so
>>> you can insert e.cause even in a translated message.
>>>
>>>
>> Why would you do it that way?
>>
>
> What, make the catcher generate the text? Different clients might want to
> say something different -- the gui might have a suggestion that's more
> appropriate for a gui (or a gui user) than the cli.
>
My idea was to make the string representations as clear as possible so
that if we traceback (not something that should happen, but certainly
something which can happen) the user has as much information as
possible. If a higher function has a better idea of what to do (like
client.py for example) it can catch the exception and add to the string
representation, or just ignore it all together. It can always grab
self.cause out if it wants that detail, but not the others.
>
>>> search_storage.py:
>>>
>>> - line 37: It seems like this function would be a lot simpler if you
>>> used
>>> a file locking protocol instead. Also, why do you bail out if some
>>> index files are missing, but retry if the versions don't match? Is
>>> there a substantive difference between the two conditions?
>>>
>> If there's a difference, I can't remember what it is.
>>
>
> Ah, long-term development. :) If there's no good reason for the
> difference, then eliminate it.
>
Indeed, I've removed it.
>>> - line 50: this won't let you distinguish between None and 0 for timeout
>>> (where the former could mean no timeout, and the latter timeout
>>> immediately)
>>>
>> It's unclear to me why you'd choose a timeout of 0. That (depending on the
>> granularity of the clock) could mean you'd never try to open a file.
>>
>
> Well, you could put the test at the end of the loop instead, allowing for
> at least one attempt. The idea would be precisely that -- try once, and if
> it fails, don't try again.
>
>
I don't think we want to expose that behavior. I can think of no good
reason to try once then fail in this setting. If that's really what's
desired, I'll provide that functionality, probably through a different
argument.
>>> - line 170: lose the parens (change the tests, turn and into or). You
>>> can also say stat_info.st_mtime, etc. If you wanted, you could even
>>> say:
>>>
>>> if not (self._mtime, self._size) == \
>>> (stat_info.st_mtime, stat_info.st_size):
>>>
>>>
>> Changed to:
>> if self._mtime != stat_info[stat.ST_MTIME] or \
>> self._size != stat_info[stat.ST_SIZE]
>>
>
> Why "stat_info[stat.ST_MTIME]" over "stat_info.st_mtime"?
>
>
Hmm, didn't realize the latter was an option (glossed over that the
first time I guess). Changed.
>>> - line 243: I'm a bit curious why you didn't just use the normal Python
>>> syntax for dictionaries and lists.
>>>
>>>
>> I don't understand what you mean. I choose a format that could be easily
>> parsed and yet fairly compact.
>>
>
> I mean that the format you chose is almost like what Python uses to print
> out dictionaries and lists, but not quite. It uses ":" to designate a
> mapping between key and value in a dictionary, rather than "->", for
> instance. The upside of using the Python syntax is that you could print
> out a dictionary without having to write any code to do so -- just "print
> mydict" (or "pprint.pprint(mydict)", if you wanted something a bit easier
> for a human to read). Parsing could simply be a call to eval(), if that
> were sufficiently safe, or some of the Python parsing routines otherwise.
>
> If you don't think that'd be easier, no worries. It's just a bit weird.
> 408736 3412
>
Ah, now I understand. I think I'm going to leave it as is since the
actual internal storage now is:
dict -> list of (key, list of (fmri, version) tuples) tuples.
Also, when we're reading in the file, we're not actually ever forming
the entire data structure in memory, we're reading and processing it a
line at a time. So we couldn't eval it or anything like that I don't
think. Well, we could eval each line, but then we couldn't just dump the
dictionary to the file.
I'm clearing up one or two things, and then I'll repost.
Brock
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss