Re: [pkg-discuss] API Code Review

Tom Mueller (pkg-discuss) Mon, 29 Sep 2008 13:12:31 -0700

Brock,
Please see inline.

Brock Pytlik wrote:
> Tom Mueller (pkg-discuss) wrote:
>> Brock Pytlik wrote:
>>  
>>> Tom Mueller (pkg-discuss) wrote:
>>>    
>>>> Brock,
>>>>
>>>> Please consider these comments of a more general nature (looks like 
>>>> Shawn got the detailed ones).
>>>>
>>>> Please change the API to be more object oriented:
>>>> - The term "PkgClient" doesn't represent what the thing is that is 
>>>> being manipulated here.  The object here is an Image.  If a client 
>>>> is going to manipulate multiple images at the same time, it would 
>>>> need multiple of these objects, but there is still only one client.
>>>>         
>>> I agree. I'm open to better suggestions.
>>>     
>> I would suggest Image.
>>   
> Since we already have an Image, how about ImageInterface?
We don't have a pkg.Image or a pkg.client.Image.  The problem with 
ImageInterface is the same as having a module named "api".  From the 
perspective of the user of an API, the API is all that there is.  So 
using "Interface" or "api" in the names is redundant.


>>>> - The PkgClient.describe method returns a PlanDescription, so it 
>>>> would seem that the thing being described in a "Plan" but there 
>>>> isn't any "Plan" object in the API.
>>>>         
>>> Why would there be a plan object in the API that the outside world 
>>> should know about? You tell the API to make a plan. If you want to, 
>>> you can ask for a description of what that plan will do (which gives 
>>> the PlanDescription). Hiding the plan is one way we're refactoring 
>>> the interfaces to present a consistent and simple interface to the 
>>> client. If you can present an example where the client needs to 
>>> modify the plan, I'd be very interested to hear it.
>>>     
>> A typical operation for a GUI is:
>>
>> - make a plan
>> - get some information from the plan and query the user, e.g.,
>> -- confirm that the dependencies should be installed too?
>> -- confirm the license?
>> - execute the plan
>>
>> If the plan is a first class object in the interface, then the first 
>> step is an operation on the Image that returns a Plan, and then the 
>> remaining steps are operations on the Plan. In the internal API, 
>> these are separated into the Image object and the ImagePlan object.  
>> Merging all of this into a single class doesn't necessarily make 
>> things consistent and simple - it just makes a single class with more 
>> methods.
>>
>>   
> And if a Plan is a first class object, then we when reimplement the 
> plan, which a piece of code that changes fairly rapidly as we add 
> backtracking, make evaluation less memory intensive and faster, 
> etc..., then we have to carefully coordinate what we're doing with 
> clients.
Coordination is required only if the interface to the object changes.  
We can reimplement all we want inside any of the modules.

And if adding backtracking, for example, is something that the client 
app needs to know about, then the interface is going to have to change 
anyway.

So the existence of a Plan object, or any other object that the client 
really has to know about anyway, does really effect reimplementation.
>
> Everything you've described can be done currently with the API as it's 
> written. All a client should need to know (at least this is my opinon 
> until I'm given a convincing counterpoint) is which packages are 
> changing, what way they're changing, and enough information about 
> those packages to answer questions the user may have. I believe the 
> API provides exactly that, and, if it's deficient, can be expanded.
This is definitely a question of aesthetics. Just because it is possible 
with the API doesn't mean that it is simple and easy to understand for 
users and easy to extent when needed.
>>>> I have yet to see a Python library that has a module called "api".  
>>>> Can you provide an example of another Python library that uses this 
>>>> pattern? As a counter-example, consider the CherryPy and PyOpenSSL 
>>>> libraries that we use.  CherryPy exposes the cherrypy and 
>>>> cherrypy.wsgiserver modules as public interfaces.  PyOpenSSL 
>>>> exposes the OpenSSL and OpenSSL.crypto modules.  Even for a large 
>>>> Python system like mercurial, there is no "api" module.  One just 
>>>> uses "from mercurial import hg" when writing an extension to 
>>>> mercurial.  Please rename the "api" module to represent what 
>>>> interface is actually being used.  I'd suggest "pkg.client.image", 
>>>> but that is already taken. ;-)  Maybe these classes could be moved 
>>>> into pkg/client/__init__.py in such a way that a programmer would 
>>>> just do "import pkg.client".
>>>>         
>>> Is this an objection to the naming of "api.py"?     
>> yes.
>>  
>>> If so, I'd happily take suggestions. I'm mildly against moving 
>>> things into __init__.py as I think it's a fairly python specific 
>>> construction which provides a needless barrier to entry to others 
>>> coming to use our code, something I'd suggest is something an API 
>>> should avoid where possible.
>>>     
>> Moving it into __init__.py makes it so that the only interface the 
>> users need to know about is "import pkg.client".
>> Seems pretty simple. I don't know why this would be a barrier to 
>> entry, assuming that there is appropriate documentation with the API.
>>   
> It's a barrier to entry because __init__.py exactly obviously a class 
> module. It's certainly not the first place anyone would go looking for 
> an api if they downloaded the gate.
Why would anyone wanting to use the API download the gate? When you use 
any other python module, do you download the source code for that module?

One of the primary parts of having an API is that it should be 
documented.  We need to have python manual-like documentation for the 
pkg(5) API.
> If I'm a newbie to python, I'd go look for the client.py file in pkg 
> directory, and immediately hit a brick wall, cause, wait, there's no 
> file named that there. This something that python does that many 
> languages don't (C, C++, Java, OCaml, I feel fairly certain don't have 
> a similar, commonly used organization. With less certainly, I'd add 
> scheme, ML, Perl and Prolog to that list.). It's the same reason I'd 
> avoid having the API return generator functions for example. I believe 
> we'd like the API to be accessible to programmers from all 
> backgrounds, not just Python wizards.
The solution to that is documentation.
>>> Also, the argument that others don't do this isn't, by itself 
>>> tremendously convincing to me. Certainly, I'll take another look at 
>>> how those programs are structured, but because CherryPy or PyOpenSSL 
>>> do things one way doesn't mean it's the right way for us to do it.
>>>     
>> When I do something that is different from every other example I see, 
>> it's either because I've come up with something no one else has 
>> thought of, or it's because that thing has already been rejected by 
>> everyone else. For me, I usually assume it is the latter.
>>   
> Or it could be because we have a different set of requirements, uses, 
> design, or beliefs in what good design/interface is. I also believe 
> smart people can look at the same problem and come up with different 
> solutions, each with different strengths and weaknesses. Choosing 
> between is a matter of priority, not correctness.
>>>> There is insufficient encapsulation in the API. - the PackageInfo 
>>>> object exposes version.DotSequence objects.  Is that intentional? 
>>>> i.e., is version.DotSequence part of the external API?
>>>>         
>>> The decision about how to expose Fmris, Versions, and Dot sequencies 
>>> hasn't been made yet. Mostly because I don't have a strong feeling 
>>> yet about what the right answer is. As answer to this and many of 
>>> your subsequent questions: this is an API in developement, really, 
>>> an intermediate API. Things will change. Things that are exposed may 
>>> become hidden, and things that are hidden may become exposed as I 
>>> discover information insufficiencies in the API.  (For example, 
>>> refresh currently returns an actual image. It does this because, 
>>> given the other functions implemented, there would be no way for the 
>>> GUI to update it's information after a refresh without rereading 
>>> from disk, something that seemed silly to do since that information 
>>> was already in memory. So, for now, it returns the image, a clear 
>>> violation of the spirit of the API, and something that will be 
>>> changed once it's more fully functional.)
>>>     
>> I'm confused then.  CR 3443 says that the purpose is to provide a 
>> "stable" API. But yet this is an intermediate API that is going to 
>> change. I understand that this has to be evolutionary, but it seems 
>> that there has to be a certain amount of stability before this is 
>> first used. My reason for bringing up these issues is that it doesn't 
>> seem this minimal level of stability is there yet.
>>   
> Ok, let me try to be as clear as I can be. When the API is done, it 
> will be "stable" (note that I don't mean ARC "stable", I'm not sure 
> which term would apply currently) which means that changes to it will 
> be made carefully, in coordination with client code to prevent 
> breakage. Once we've settled in with it, I would suggest we look at 
> upgrading the interface to ARC "stable", once we're satisfied it meets 
> the needs of clients.
>
> I could off and spent 2 months putting together an entire API. That 
> didn't meet the time constraints provided. It didn't meed the need of 
> getting the CLI and the GUI following the same code paths on critical 
> operations. Lastly, it's entirely possible I would've come back and 
> found out the API was deficient in ways I hadn't imagined. Creating 
> this API is going to be a rolling process, as everything else we do on 
> this project has been.
Given this, the existing API (pkg.client.image.Image, etc.) is the most 
stable API that we have right now. It is certainly the most expedient 
means to achieve the desired result.
>>>> - the PkgClient class requires a ProgressTracker object.  So is 
>>>> this API making ProgressTracker a public interface? Also, if one 
>>>> wants to use different progress trackers for different operations, 
>>>> how does one do that?  (for example, different progress trackers 
>>>> for loading the catalogs vs. making the install plan in the 
>>>> plan_install method.)
>>>>         
>>> Yes, as I've said previously in email here, the ProgressTracker is 
>>> part of the API.
>>>     
>> Is there a document that lists what classes and methods are all part 
>> of the API, similar to what we see in the Python manual for each module?
>>   
> Nope, I've been more focused on actually making the thing work.
> But, the list, as I know it today, is everything in api.py, 
> api_errors.py, progress.py, and bits and pieces of misc.py (which I 
> hope to get removed eventually, but I lost that battle recently, and 
> I'll focus on other things before returning to that issue again).
and fmri.py and parts of version.py (DotSequence).  I recommend making 
it a high priority to document this list.
>>> Can you present a use case for A) Why you'd want to switch progress 
>>> trackers partway through an operation B) Why you couldn't just make 
>>> your own hybrid of the progress tracker which does what you want it to?
>>>     
>> The updatetool GUI uses a QuietProgressTracker to load the catalogs 
>> and then uses a custom-defined progress tracker that implements a 
>> progress bar dialog for evaluating and executing the install plan.
>>
>> One could certainly implement a hybrid - that seems like a reasonable 
>> option.
> I'm also not opposed to adding a "set_progress_tracker" method, should 
> it be desperately needed.
>> [snip]
>>>> All exceptions should return data that can be extracted by the 
>>>> client. VersionException and several others are good examples of 
>>>> this.  But ProblematicPermissionsIndexException doesn't do this.
>>>>         
>>> What data would you like it to return?
>>>     
>> When I first read this, I didn't realize the "cause" is already 
>> available. So the data is the cause (with a clarification of what 
>> this is - it appears to be a string that is a pathname).
>>
>> There needs to be documentation for each exception saying what data 
>> is available with the exception.  In Java, this would be done with 
>> accessor methods and private data. Since accessor methods and private 
>> data is not generally the norm in Python, this has to be done with 
>> documentation.  Otherwise, how does the user of the exception know 
>> what data is available?
>>
>>   
> Again, once the API is done, there will be documentation. For now, 
> unless I decide to spend my time adding that instead of, say, 
> preventing the depot from falling over when sent a certain path, it'll 
> just have to wait for a bit and people can do exactly what they've 
> been doing up to now: looking at the exception to see what data it 
> provides. But at least now, there's a single location with the 
> exceptions, instead of sending them grepping through the code base to 
> find where the exception is defined.
>>>> What is the intent of the __activity_lock functionality?  We 
>>>> eventually need to have locking that prevents multiple clients (in 
>>>> separate processes) from modifying an image concurrently.  Once we 
>>>> have that, the same functionality will also work for different 
>>>> threads in a single process. So thread-based locking is not really 
>>>> useful here. It's also not clear to me that this API layer should 
>>>> be responsible for locking, as that may not provide sufficient 
>>>> granularity.
>>>>         
>>> The point of locking is so that, for example, multiple GUI threads 
>>> can't start planning an install, an uninstall, and an image_update 
>>> all at the same time on the same PkgClient object, for example. Can 
>>> this become more finely grained in the future? Probably. But for 
>>> now, they help make the code thread safe, something that's important 
>>> for the GUI. When we have the more general mechanism, perhaps, and 
>>> only perhaps, it can replace the mechanisms here.
>>>     
>> IMHO, cross process concurrency is a bigger issue. And solving that 
>> problem would solve the thread one too.
>>   
> I agree, but, that's not THIS problem. If and when we solve that 
> issue, the locking code in api can poof (probably). This API is not 
> going to fix all problems with our code. It's supposed to fix a select 
> few. Cross-process concurrency is not in that set. Of course, if 
> someone has a ready solution, by all means give me the patch, either 
> to the gate or on top of my API, and I'll integrate it into my putback 
> happily.
My point is that I don't see anything in issue 3443 that says the API 
has to fix the cross-thread problem either (except maybe for the word 
"safe" in the summary, but the description doesn't say anything about 
it). The introduction of locking into the API seems to be feature creep.
>>>> At what point would you expect the imports of the other 
>>>> pkg.client.* and pkg.* modules to be removed from client.py?
>>>>       
>>> When the API is complete, and not in development.
>>>    
>>>> (before or after the first checkin of this API?)
>>>>       
>>> After, since I foresee at least 2 stages of commits, and probably 3
>>>    
>>>>   I'd like to at least see a plan for how complicated this api 
>>>> module will get in order to be able to remove those imports, so 
>>>> that the true expense of this option will be understood before the 
>>>> team has to live with it.
>>>>         
>>> So, what you'd like is for me to do all the work needed to 
>>> completely analyze what the final API would look like (which 
>>> basically means creating and implementing the final API since, at 
>>> least for me, that's the only way to be certain I've gotten things 
>>> correct). That's fine, but there's little chance that can be 
>>> finished in time for the GUI team to move to it for the november 
>>> release. Until I receive firm direction that we've changed our 
>>> schedule on this, I'm going to proceed as I've planned.
>>>
>>> I'll make one other point, if this option, once it's fully in place, 
>>> is truly so terrible that we as a team "can't live with it", then we 
>>> can blow it up and someone else can try again. All that we'll have 
>>> lost is the time I've spent on it (and the amount of time we've 
>>> spent kvetching about it).
>>>     
>> In a previous message[1], Stephen outlined three general options for 
>> introducing a stable API.  It seems like you've picked option 1 
>> without even considering option 2 or 3.  In fact, you've started to 
>> write the code for option 1 without having looked at what a design 
>> for options 2 or 3 would even look like (at least I haven't seen such 
>> a design sent to the pkg-discuss alias, maybe options 2 and 3 were 
>> discarded having never made it that far).  Since this started, I've 
>> just been asking that there be a reasonable evaluation of these 
>> various options.  This isn't a question of trying one thing and then 
>> having someone else try another thing. If the various major design 
>> options are considered and then compared side by side, then an 
>> appropriate design can be selected. And if option 1 is selected, I 
>> can support that (as I've tried to do by providing these comments).
>>   
> No, I've considered 2 and 3, just because I haven't put them out to 
> the group doesn't mean I didn't think about them. (I'll group them 
> together as I believe they're essentially isomorphic transformations 
> of the code. While the details might differ, the overall structure 
> would be similar.) I viewed my job as to look at the various options, 
> and provide a solution to the problem. Had there been two options 
> which seemed to be reasonable alternatives, then I would've turned to 
> the group for guidance. Since, in my opinion, the choice was clear, I 
> moved ahead. I'd suggest that 2 or 3 is what you and some others have 
> proposed previously, so it's certainly been considered. I feel like 
> I've done the best I can to explain why I don't believe those are 
> adequate solutions to our problems. 
Did I miss that explanation?  I'm sorry if I passed that by. I'd like to 
take another look at the explanation for why #3 would not solve the 
problems.  The very first message I saw on the API is here: 
http://mail.opensolaris.org/pipermail/pkg-discuss/2008-September/006168.html 
. And this doesn't say anything about why the #3 approach wouldn't solve 
the problems.

Tom
> I get that you disagree. If you'd like, you can put together an API 
> which follows options 2 or 3, and we as a group can choose between 
> them. I'm not going to spend my time doing that because I've done it 
> internally, and found them lacking for our problems. You can also 
> continue to try to convince me that you're right, but please 
> understand that until I am convinced, I'm going to plow ahead because 
> we have deadlines to meet and other things to do.
>> Tom
>>
>> [1] 
>> http://mail.opensolaris.org/pipermail/pkg-discuss/2008-September/006464.html 
>>
>>  
>>>> Thanks.
>>>> Tom
>>>>
>>>>
>>>>
>>>> Brock Pytlik wrote:
>>>>  
>>>>      
>>>>> Here's the link:
>>>>> http://cr.opensolaris.org/~bpytlik/ips-api-v1/
>>>>>
>>>>> I haven't done as much cleanup as I usually try to do, but since 
>>>>> I'm ooto tomorrow, I thought it would be better to get this out 
>>>>> now, rather than next Monday.
>>>>>
>>>>> Other things that I know need to be improved, but that I think can 
>>>>> wait for another round of revisions:
>>>>> API Exception organization
>>>>> More complete functionality
>>>>> Automated testing of cancellation (I hope to work with the GUI 
>>>>> team to test this by hand in the interim)
>>>>> Versioning of the Progress tracker (perhaps)
>>>>> More Documentation
>>>>> Figure out the future approach to testing we'll take
>>>>>
>>>>> Thanks,
>>>>> Brock
>>>>> _______________________________________________
>>>>> pkg-discuss mailing list
>>>>> [email protected]
>>>>> http://mail.opensolaris.org/mailman/listinfo/pkg-discuss
>>>>>               
>>>> _______________________________________________
>>>> pkg-discuss mailing list
>>>> [email protected]
>>>> http://mail.opensolaris.org/mailman/listinfo/pkg-discuss
>>>>         
>>
>> _______________________________________________
>> pkg-discuss mailing list
>> [email protected]
>> http://mail.opensolaris.org/mailman/listinfo/pkg-discuss
>>   
>

_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] API Code Review

Reply via email to