[email protected] wrote:
On Mon, Sep 28, 2009 at 09:45:32PM -0700, Brock Pytlik wrote:
Webrev:
http://cr.opensolaris.org/~bpytlik/ips-7960-v1/
transport.py:

  - lines 456-460:  Why is this exception handler necessary?
Why wouldn't it be? This seemed like a reasonable place to translate the exception into something the client code was prepared to deal with.

It seemed like an obvious question to me, since this logic wasn't
present before.  Is this code more likely to encounter a permissions
error than the previous code?  Explain, please.
The exception handling was there. It was in the call to os._makedirs. Since lookup cannot raise an api exception (see below) it has to translate the exception into something meaningful to the api user. It is neither more nor less likely to take any error path than it was before, the location of the except clause has merely been shifted.
modules/file_layout:

  - The source under this directory hierarchy is < 300 lines of code.
    Could we coalesce this into a single file in modules/ or am I
    missing something?
I thought that since the layouts and file manager were fundamentally different pieces of functionality, they should go into different files.

Maybe, but if the code is small enough, and you're not designing a
subsystem, it may be more trouble than it's worth.

If you feel strongly, or others agree with you, I'll merge them, otherwise I'd prefer to leave them as is.
modules/file_manager.py:

  - line 46: This name conflicts with a class in another module:
    api_errors.PermissionsException.
I was under the impression that it was fine to have names that were the same but in different modules... that's why the module names are attached to them. In fact, there are other places in the code where this is done today. (see api_errors.py vs search_errors.py)

In other places in that code, you simply raise PermissionsException, I
believe.  Whether or not the interpreter tolerates this behavior, it's
confusing to humans.  I'd either rename this exception to something
else, or prefix it with something that makes it obvious that it belongs
to the FileManager. FMPermissionException, maybe?
Well, considering that it has no way to know about an api exception, I'm not really seeing the confusion. But since FMPermisisonException seems better to you, I'll change it.
  - lines 72-73: Should this be a class property, it doesn't look like
    it's going to vary between instances?
I don't think so. If it ever changed, it'd probably vary by instance, not by class. For example, if the layouts were stored on the file system (something I considered for a bit before deciding it was currently unnecessary) then part of initialization of an instance would be to load it off disk. Besides, in that world, if a multi threaded program were operating on different two images with different layouts, having it as a class variable would break things.

I find this answer confusing, since you don't allow the FileManager's
constructor to choose the layout.  It's essentially hard-coded on these
two lines.  It would make more sense to define the layout as properties
of the image. Then the image could pass the layouts it supports to the
FileManager.
Right now, I don't allow the FileManager's constructor to choose the layout. That doesn't mean it won't in the future. At one point, it was choosing the layout based on what was stored on disk, but I decided that over complicated things for no good reason at this point. Why would an image know what layouts it supports? Part of the point here is to encapsulate the file layout and not make the image deal with it, so that we can enable other functionality later if we deem it appropriate (like some form of cache clean up as you mentioned before).
  - line 91:  How does this function cope with files that may be
    readable in one tree, but not the other?
I'm not sure what you mean by "one tree but not the other." Assuming layout can replace tree, then right now it doesn't cope. What happens currently if someone goes in and makes selective files in the download directory not readable? If it's useful, I can stat the file on line 110 and ensure that it's readable.

The two scenarios this code supports right now is that either the source
and destination are both read/write, or both read only.  I'm asking if
it's reasonable to expect a deployment scenario where the source is
readonly but the destination is read/write.  (A read/write source with
readonly destination is a degenerate case of both sides being
read-only).  In the case where the source is read-only, you can't move
the file to the new layout, but you could copy it, but I don't see
support for this case.

No, I don't think that's a reasonable scenario at this time for the file manager to support. For now, I want to keep its job narrow and constrained, not make it a general repo seeding method, or anything else.
file_layout/layout.py:

  - Why do you allow Original to name itself "s1" by defining SHA1_160,
    but Fan256Once allows the class that instantiates it to give its
    name.  Wouldn't it be better to encode the name as a class property
    in both cases instead?
Original can only be SHA_160 since that's the old format and we're not going back to change it. Fan256Once can use SHA_256, when we add it, as well, hence it's an argument to the constructor.

I don't understand your response. All instances of Original must use
SHA1_160, correct?
Correct
  That would make it a candidate for a class propery,
at least in this case.
Yes, but the other layouts aren't amenable to a class property, so why special case this one?
In the case of Fan256Once, it's not clear to me
that setting hash type to "s1" is descriptive at all.
My understanding was I was free to choose whatever label I desired since it's entirely internal to the file manager. It's simply there to allow us to know what kind of hash algorithm was used so that if we use hash algo 1 (which produces N bits) and hash algo 2 (which produces N bits) at different times, we'll be able to avoid collisions. If you'd like to suggest a more meaningful name, by all means do.
  This object
doesn't check that it can use the hash type, nor is it apparent what
this heiararchy does with the value.
It doesn't check the hash input, you're correct. This is not the "support multiple hashes in our actions" wad. This is the "fix our layout" wad. Now, because I know some things are coming down the pipe, I've tried to lay some groundwork to ease that transition, where it made sense and most importantly, didn't greatly increase the complexity of this wad.
I'd suggest that you use a more
object-oriented approach and pass an object that knows how to perform
the hash function to each layout manager.
I don't understand.
Moreover, it seems like changing the layout function to work more
generically would be helpful.  Instead of having it be Fan256Once, how
about FanoutUniform() with the depth and the number of directories given
as arguments to the constructor.
Because I want to keep this as small and simple as possible. If you'd like a more generic layout class, you're free to write a subclass.
 You can state that each level will
have N directories, and if people want to write Fanouts that fan
differently at different depths, they can use this as a starting point.
This also means that it becomes trivial to implement FanNone, where all
files are in one directory.
It's trivial now to implement FanNone.
  - lines 59/60 and 65/66:  These should match the function signatures
    in the parent class that they are overriding.
They do except in one class they're declared static but not in the other. I can make them all non-static if you prefer. This works though.

As long as this doesn't create problems for future objects that inherit
from Original, I'm okay with this.
I hadn't considered that scenario. I'll just change it to all be non-static.
General comments:

  - You did this work as part of a performance investigation surrounding
    file layouts and directory access times.  It would help the rest of
    us provide useful feedback if you would include information about
    the experiments that you ran, and the data that you obtained.  In
    particular, how did you decide that one level of 256 directories was
    optimal?
Here's the openoffice spread sheet if you'd like to see the numbers (http://cr.opensolaris.org/~bpytlik/ips-dir-timing.ods).

I'm getting a 403 (Forbidden) when trying to access this URL.

Well that's strange, so am I. Any idea how to fix that, or how it happened?
In any case, I'll send it to you off list. If anyone else wants a copy, let me know.
  - You've created the file layout and file manager as part of an
    extensible architecture.  Would you provide more information about
    your design and architecture?  As an example, how would I go about
    adding a new layout to this design?
You'd add a new class to the layout file, and add it an instance of it as the first item in self.layouts in FileManager.

This isn't exactly what I'm after.  I'm more curious about your
architecture.  How are you envisioning that this will be extended?  Can
you document your design goals?  Could someone who looks at this
document figure out how to enhance the archtiecture later?  You're
checking this in as a new subsystem, but it's currently handling a
narrow set of tasks.
Yes, it's handling a narrow set of tasks, why is that an issue?

I have vague ideas of how this can be extended, but I haven't designed that b/c it's not in the scope of the bug I'm trying to fix: the layout of files on disk.

I would assume that anyone who wanted to enhance it later would have as much clue as I do how to go about it. I'm not sure what you're after. This isn't the be-all end-all to manage the cache. At best, it's laying the ground work to allow future work to do cache management. What that'll look like? I don't know. I have real questions whether any sort of time based cache makes sense.
  - The FileManager implements a cache of sorts, but I don't see any
    hooks for some of the typical operations that would be performed on
    a cache.  How do you determine what items to evict when the cache is
    full?  How do you plan to support multiple policies, both for sizing
    and eviction?  Are there any options for those who need to perform
    external cache maintenance?
Not this bug.

You might not be implementing support for it in this bug, but if you're
designing a subsystem to replace the existing, admittedly naive,
download cache, you should think about how you plan to address these
issues.
That's not what I've signed on for. If that's necessary, I'll vastly reduce my change set and simply rewhack the bits in misc to try and be slightly better, or hand it off to someone who wants to attack the larger issue of cache management.
I would argue that you should include this functionality, especially
since it's something that we've been lacking for a long time and do very
much need.  There was a post about it on our list today, in fact.

        
http://mail.opensolaris.org/pipermail/pkg-discuss/2009-September/017353.html
Yep, I saw that, still not this bug.
If you haven't thought about how you expect someone to accomplish these
tasks, you should make sure that the current design doesn't preclude
intelligent cache management, because that's the very next thing that
we'll have to do with this code.
In short, I believe the right place to do this is actually in the same place we do indexing. By default, what it should do is remove the files from the download area which have been removed or changed in the new packages. Only the imageplan has this knowledge, and I don't believe there's a sane way to put that approach into the file_manager b/c it doesn't have access to the plan. Changing the layout will make emptying the cache each time significantly faster, allowing users to more easily choose that setting. Of course, we'd also need a "keep everything setting" and there are probably other settings based on time/size/the phase of the moon that we may eventually want to support. All of these are not this bug. I see nothing in the current design that would prevent an implementation of any of these, if you do, please give concrete examples. In fact, I see this work making those steps easier b/c now there's a central location through which well behaving code will add and remove files, making it easier for there to be something that has knowledge of the state of the layout as a whole.

As I've said, I'm willing to rip this out and make the truly minimal set of changes needed to change the layout used. I don't think that's a good idea since I think this lays useful ground work, but if it doesn't, I can remove it all.

Brock
-j

_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to