BJörn Lindqvist wrote: > On 10/1/06, Guido van Rossum <[EMAIL PROTECTED]> wrote: >> On 9/30/06, Giovanni Bajo <[EMAIL PROTECTED]> wrote: >>> It would be terrific if you gave us some clue about what is wrong in >>> PEP355, so >>> that the next guy does not waste his time. For instance, I find PEP355 >>> incredibly good for my own path manipulation (much cleaner and concise than >>> the >>> awful os.path+os+shutil+stat mix), and I have trouble understanding what is >>> *so* wrong with it. >>> >>> You said "it's an amalgam of unrelated functionality", but you didn't say >>> what >>> exactly is "unrelated" for you. >> Sorry, no time. But others in this thread clearly agreed with me, so >> they can guide you. > > I'd like to write a post mortem for PEP 355. But one important > question that haven't been answered is if there is a possibility for a > path-like PEP to succeed in the future? If so, does the path-object > implementation have to prove itself in the wild before it can be > included in Python? From earlier posts it seems like you don't like > the concept of path objects, which others have found very interesting. > If that is the case, then it would be nice to hear it explicitly. :)
Let me take a crack at it - I'm always good for spouting off an arrogant opinion :) Part 1: "Amalgam of Unrelated Functionality" To me, the Path module felt very much like the "swiss army knife" anti-pattern - a whole lot of functions that had little in common other than the fact that paths were involved. More specifically, I think its important to separate the notion of paths as abstract "reference" objects from filesystem manipulators. When I call a function that operates on a path, I want to clearly distinguish between a function that merely does a transformation on the path string, vs. one that actually hits the disk. This goes along with the "principle of least surprise" - it should never be the case that I cause an i/o operation to occur when I wasn't expecting it. For example, a function that computes the parent directory of a path should not IMHO be a sibling of a function which tests for the existence or readability of a file. I tend to think of paths and filesystems as broken down into 3 distinct domains, which are locators, inodes, and files. I realize that not all file systems on all platforms use the term 'inode', and have somewhat different semantics, but they all have some object which fulfills that role. -- A locator is an abstract description of how to "get to" a resource. A file path is a "locator" in exactly the sense that a URL is. Locators need not refer to 'real' resources in order to be valid. A locator to a non-existent resource still maintains a consistent structure, and can be manipulated and transformed without ever actually dereferencing it. A locator does not, however, have any properties or attributes - you cannot tell, for example, the creation date of a file by looking at its locator. -- An inode is a descriptor that points to some actual content. It actually lives on the filesystem, and has attributes (such as creation data, last modified date, permissions, etc.) -- 'Files' are raw content streams - they are the actual bytes that make up the data within the file. Files do not have 'names' or 'dates' directly in of themselves - only the inodes that describe them do. Now, I don't insist that everyone in the world should classify things the way I do - I'm just describing how I see it. Were I to come up with my own path-related APIs, they would most likely be divided into 3 sub-modules corresponding to the 3 subdivisions listed above. I would want to make it clear that when you are operating strictly at the locator level, you aren't touching inodes or files; When you are operating at the inode level, you aren't touching file content. Part 2: Should paths be objects? I should mention that while I appreciate the power of OOP, I am also very much against the kind of OOP-absolutism that has been taught in many schools of software engineering in the last two decades. There are a lot of really good, formal, well-thought-out systems of program organization, and OOP is only one of many. A classic example is relational algebra which forms the basis for relational databased - the basic notion that all operations on tabular data can be "composed" or "chained" in exactly the way that mathematical formula can be. In relational algebra, you can take a view of a view of a view, or a subquery of a query of a view of a table, and so on. Even single, scalar values - such as the count of the number of results of a query - are of the same data type as a 'relation', and can be operated on as such, or fed as input to a subsequent operation. I bring up the example of relational algebra because it applies to paths as well: There is a kind of "path algebra", where an operation on a path results in another path, which can be operated on further. Now, one way to achieve this kind of path algebra is to make paths an object, and to overload the various functions and operators so that they, too, return paths. However, path algebra can be implemented just as easily in a functional style as in an object style. Properly done, a functional design shouldn't be significantly more bulky or wordy than an object design; The fact that the existing legacy API fails this test has more to do with history than any inherent advantages of OOP vs. functional style. (Actually, the OOP approach has a slight advantage in terms of the amount of syntactic sugar available, but that is [a] an artifact of the current Python feature set, and [b] not necessarily a good thing if it leads to gratuitous, Perl-ish cleverness.) As a point of comparison, the Java Path API and the C# .Net Path API have similar capabilities, however the former is object-based whereas the latter is functional and operates on strings. Having used both of them extensively, I find I prefer the C# style, mainly due to the ease of intra-conversion with regular strings - being able to read strings from configuration files, for example, and immediately operate on them without having to convert to path form. I don't find "p.GetParent()" much harder or easier to type than "Path.GetParent( p )"; but I do prefer "Path.GetParent( string )" over "Path( string ).GetParent()". However, this is only a *mild* preference - I could go either way, and wouldn't put up much of a fight about it. (I should not that the Java Path API does *not* follow my scheme of separation between locators and inodes, while the C# API does, which is another reason why I prefer the C# approach.) Part 3: Does this mean that the current API cannot be improved? Certainly not! I think everyone (well, almost) agrees that there is much room for improvement in the current APIs. They certainly need to be refactored and recategorized. But I don't think that the solution is to take all of the path-related functions and drop them into a single class, or even a single module. --- Anyway, I hope that (a) that answers your questions, and (b) isn't too divergent from most people's views about Path. -- Talin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com