Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
On Sat, Jul 29, 2006 at 05:35:30PM -0700, Andrew Pimlott wrote: On Sat, Jul 29, 2006 at 02:59:06AM +0200, Udo Stenzel wrote: Andrew Pimlott wrote: Second, foo is just as good a directory as foo/ to the system ...unless you have both (think Reiser4) or you want to create the file (I think, but I'm not sure). However, what's the point in being ambiguous when we can be explicit? Sometimes there is a difference, libraries and tools shouldn't gloss over that without consideration. As I said, it's one of those line-drawing exercises. But your points are well taken, and maybe the trailing delimiter should be part of the model. (My criterion has been whether any filesystem operations require the trailing delimiter. It sounds like with reiser4fs they might.) Actually, I just read in LWN that that part of reiser4 has been dropped. On the other hand, it was only dropped after considerable debate, and people using an older version of reiser4 still have the strange file-as-directory semantics. -- David Roundy http://www.darcs.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
Andrew Pimlott wrote: On Thu, Jul 27, 2006 at 09:59:37PM +0200, Udo Stenzel wrote: In fact, that's consistent with the current documentation, because * getFileName foo == foo * getFileName foo/ == I have to disagree with that. No, you don't. That's the current behaviour of Neil Mitchell's System.FilePath 0.9 according to the haddockumentation. There isn't much point in disagreeing about observable facts, is there? First of all, is not a filename; Most certainly it isn't. Which is all the more reason not to like the current design. An empty filename just isn't the same as no filename. if you mean that foo/ has no filename, it makes much more sense to use something like a Maybe type. It does very much. In fact, I don't deem getFileName to be an essential function when a simple pattern match would do the same thing. foo/ really doesn't have a file name, as it very explicitly names a directory. Second, foo is just as good a directory as foo/ to the system ...unless you have both (think Reiser4) or you want to create the file (I think, but I'm not sure). However, what's the point in being ambiguous when we can be explicit? Sometimes there is a difference, libraries and tools shouldn't gloss over that without consideration. But if you wish to make the distinction, at least provide an operation that lets me force a path to be treated file-wise or directory-wise. WTF?! A path names either a directory or a file. We might have some operations that accept file names instead of path names. What's there to be treated? Being explicit about the distinction makes any ambiguity go away. Filesystems are ugly. :-) So are microprocessors. We can still have a nice programming language, and we can also have a nice filesystem language. And it is about the slash: foo can be a directory. No, it still isn't. We can distinguish between Directory (but not file, fifo, character or block special) and anything (if in doubt, not directory), which is an essential semantic distinction and not just the accidental presence of a slash (or backslash or colon or whatever $EXOTIC_OS uses). Also, parsing paths _once_ and printing them _once_ but doing everything else by operating on their logical structure makes specifying any intermediate operation a lot easier, if nothing else. If this thread shows anything, then it is that specifying path operations is harder than expected. Udo. -- Structure is _nothing_ if it is all you got. Skeletons _spook_ people if they try to walk around on their own. I really wonder why XML does not. -- Erik Naggum signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
Andrew Pimlott wrote: On Wed, Jul 26, 2006 at 05:06:41PM -0400, David Roundy wrote: This doesn't apply uniformly to all programs--except that we can say that any path with a trailing '/' is intended to be a directory, and if it's not, then that's an error. I thought some more about this, and I think the right way to handle this is on parsing and printing. Amen. After all, the trailing slash has no real meaning for any intermediate processing you might do. Here I beg to differ. I'd expect: * setFileName foo bar == bar * setFileName foo/ bar == foo/bar In fact, that's consistent with the current documentation, because * getFileName foo == foo * getFileName foo/ == No matter whether I'm correct, whether my expectation is natural or practical and whether others agree, the bahaviour has to be clearly specified and the final slash certainly isn't unimportant. readPath :: String - (Path, Bool {- trailing delimiter -}) showPath :: Path - String showPathTrailingSlash :: Path - String This is far simpler than trying to figure out what the slash means for every path operation. It's also far uglier... besides, it isn't about the slash, it is about the difference between file and directory. Udo. -- If you cannot in the long run tell everyone what you have been doing, your doing was worthless. -- Erwin Schrödinger signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
On Thu, 2006-07-27 at 11:07 -0700, Andrew Pimlott wrote: On Wed, Jul 26, 2006 at 04:02:31PM -0700, Andrew Pimlott wrote: I admit I don't know enough to say how the lpt1 issue should be handled. Is there any Win32 call I can make that will help me avoid accidentally opening these magic files? Say, if I call open with O_CREAT | O_EXCL? Unfortunately, I can find very little information on how one should handle this issue. Thanks to a suggestion from Bulat to use c_open, I was able to test O_WRONLY | O_CREAT | O_EXCL on Windows. In fact, Windows does allow files like nul to be opened (as many times as you like) with these flags, which I find dismaying. So I still don't know the proper way to handle them. You can open the file and test the file type with GetFileType. If it's type FILE_TYPE_CHAR then it's probably not what you wanted. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/fs/getfiletype.asp Duncan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
On Thu, Jul 27, 2006 at 09:59:37PM +0200, Udo Stenzel wrote: Andrew Pimlott wrote: After all, the trailing slash has no real meaning for any intermediate processing you might do. Here I beg to differ. I'd expect: * setFileName foo bar == bar * setFileName foo/ bar == foo/bar In fact, that's consistent with the current documentation, because * getFileName foo == foo * getFileName foo/ == I have to disagree with that. First of all, is not a filename; if you mean that foo/ has no filename, it makes much more sense to use something like a Maybe type. Second, foo is just as good a directory as foo/ to the system, and they both denote the same filesystem object (the object with the name foo in the current directory), so it doesn't make sense to me for path operations to distinguish them. Maybe the second point is philosophical. But if you wish to make the distinction, at least provide an operation that lets me force a path to be treated file-wise or directory-wise. readPath :: String - (Path, Bool {- trailing delimiter -}) showPath :: Path - String showPathTrailingSlash :: Path - String This is far simpler than trying to figure out what the slash means for every path operation. It's also far uglier... besides, it isn't about the slash, it is about the difference between file and directory. Filesystems are ugly. :-) And it is about the slash: foo can be a directory. Andrew ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
On Wed, Jul 26, 2006 at 04:02:31PM -0700, Andrew Pimlott wrote: I admit I don't know enough to say how the lpt1 issue should be handled. Is there any Win32 call I can make that will help me avoid accidentally opening these magic files? Say, if I call open with O_CREAT | O_EXCL? Unfortunately, I can find very little information on how one should handle this issue. Thanks to a suggestion from Bulat to use c_open, I was able to test O_WRONLY | O_CREAT | O_EXCL on Windows. In fact, Windows does allow files like nul to be opened (as many times as you like) with these flags, which I find dismaying. So I still don't know the proper way to handle them. Andrew ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
On Wed, Jul 26, 2006 at 05:06:41PM -0400, David Roundy wrote: cp(1), for example, treats paths with trailing separators differently from paths without. This doesn't apply uniformly to all programs--except that we can say that any path with a trailing '/' is intended to be a directory, and if it's not, then that's an error. But the trouble is that if you silently drop the '/', then the only way for me to implement a correct cp(1) in Haskell is to not use your proposed interface for pathname handling, which drops this information. I thought some more about this, and I think the right way to handle this is on parsing and printing. After all, the trailing slash has no real meaning for any intermediate processing you might do. So if the type used by my path operations is Path, I might have something like readPath :: String - (Path, Bool {- trailing delimiter -}) showPath :: Path - String showPathTrailingSlash :: Path - String This is far simpler than trying to figure out what the slash means for every path operation. Andrew ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
Hi hahaha! I admit I don't know enough to say how the lpt1 issue should be handled. Is there any Win32 call I can make that will help me avoid accidentally opening these magic files? No, because its entirely possible to open these magic files, you'll just find that accidentally your output has appeared at your printer, rather than on disk. BTW, it appears that wget itself does not handle it. :-) I know, but my hope is that HsWget will :) BTW, I guess wget should truncate the path at some number of characters Fortunately if we have FilePath == String, take n can be used, or more likely joinDirectories . take n . splitDirectories Windows doesn't use UTF-16, NTFS does. I was under the impression that NT's Unicode support was conceived when it meant UCS-2. So it uses UCS-2 and not UTF-16, which would mean that you could in principle encounter lone surrogate characters or something equally nonsensical. Yep, true, it uses UCS-2. Windows has two sets of file system related functions, one for legacy 8-bit character sets, one for Unicode. What happens if I call the Unicode API on a FAT system that doesn't support it? Does it do a half-assed version of the locale specific encoding that we deem impossible and wrong here? Of course :) And if you use the ANSI API's on a NTFS system you'll also get some dodgy encoding. Ah, never mind, I get the strong feeling I really don't want to know all this. When even Windows 98 has been end-of-lifed we should rely on the Unicode API, if anything. Windows ME has not been end-of-lifed, and still has native 8-bit. Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
On Thu, 2006-07-27 at 11:07 -0700, Andrew Pimlott wrote: On Wed, Jul 26, 2006 at 04:02:31PM -0700, Andrew Pimlott wrote: I admit I don't know enough to say how the lpt1 issue should be handled. Is there any Win32 call I can make that will help me avoid accidentally opening these magic files? Say, if I call open with O_CREAT | O_EXCL? Unfortunately, I can find very little information on how one should handle this issue. Thanks to a suggestion from Bulat to use c_open, I was able to test O_WRONLY | O_CREAT | O_EXCL on Windows. In fact, Windows does allow files like nul to be opened (as many times as you like) with these flags, which I find dismaying. So I still don't know the proper way to handle them. Interestingly even Windows explorer doesn't handle these odd files consistently. Renaming a file to com1 is ignored with no error, though renaming to com1.txt gives an error about such a file already existing. Also, it seems that com1.txt.txt is not allowed either. I thought that the extension of com1.txt.txt was txt but it seems that it is txt.txt and so the base name is com1 and thus not allowed. Duncan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: ANN: System.FilePath 0.9
On 7/26/06, Neil Mitchell [EMAIL PROTECTED] wrote: The main purpose of canoncialPath is to fix the case on Windows, so c:\my documents\file.doc becomes C:\My Documents\file.doc if that is the case correct version of the file. I think this function will not actually change the file with relation to the underying file system, so should be race free. (I will document more to make the operation clearer) Hi Neil, It seems like your canoncialPath function is already in the base package. Look at System.Directory.canonicalizePath. I have added it when I was working on the FilePath module for Cabal. The FilePath abstraction was discussed a number of times and it seems that people prefer an ADT representation instead of plain String. I tend to agree. Maybe such ADT based library can be integrated with some new IO library like the Streams library. Cheers, Krasimir ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
On Wed, Jul 26, 2006 at 03:36:13AM +0100, Neil Mitchell wrote: pathSeparator :: Char The character that seperates directories. So what do I do with this? If I need it, it seems like the module has failed. Hopefully no one will ever use it. Its part of the low level functions that the FilePath module builds on. However, pragmatically, someone somewhere will have a use for it, and the second they do they'll just write '/', and at that point we've lost. I'd just point out that I'm not aware of an operating system that GHC runs on that doesn't accept '/' as a path separator. It may be that you could fine an OS where you could compile with jhc or run with hugs that doesn't use '/' (e.g. MacOS 9), but support for MacOS 9 at this stage I wouldn't consider a high priority. Since noone ought to need the path separator, and since they can currently assume '/' without loss of portability, it seems like adding in an extra function to protect us from the introduction of an operating system some time in the future that doesn't allow '/' as a path separator is a bit much. Of course, I may be wrong. Does windows disallow mixing of '/' and '\\' as path separators? In darcs we always just use '/' as all the path separators, and it works fine... -- David Roundy ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
I'd just point out that I'm not aware of an operating system that GHC runs on that doesn't accept '/' as a path separator. It may be that you could fine an OS where you could compile with jhc or run with hugs that doesn't use '/' (e.g. MacOS 9), but support for MacOS 9 at this stage I wouldn't consider a high priority. Since noone ought to need the path separator, and since they can currently assume '/' without loss of portability, it seems like adding in an extra function to protect us from the introduction of an operating system some time in the future that doesn't allow '/' as a path separator is a bit much. Of course, I may be wrong. Does windows disallow mixing of '/' and '\\' as path separators? That's fine, at the file system level. However some programs that are layered on top of that, for example the copy command in the shell, will bork on /. Also on Windows fundamentally \ is the separator, and / is a second class separator. When showing paths to the user, it should always be \ because thats the one thats right (TM) for the platform. Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
Andrew Pimlott wrote: The drive functions stand on their own as a chunk, and are possibly not well suited to a Posix system, but are critical for a Windows system. Why are they critical for portable code? I am fine with Windows-specific functions, but I think it's a mistake to bundle them [with] portable functions. I couldn't agree more. In fact, why can't we pretend the world is sane at least within Haskell and just put away those drive letters? My criticism is that your properties are all specified in terms of string manipulation. Exactly. I believe, a FilePath should be an algebraic datatype. Most operations on that don't have to be specified, because they are simple and have an obvious effect. Add a system specific parser and a system specific renderer, maybe also define a canonical format, and the headaches stop. What's wrong with this? data FilePath = Absolute RelFilePath | Relative RelFilePath data RelFilePath = ThisDirectory | File String | ParentOf RelFilePath | String :|: RelFilePath parseSystemPath :: String - Maybe FilePath renderSystemPath :: FilePath - String We can even clearly distiguish between the name of a directory in its parent and the directory itself. On Windows, the root directory just contains the drive letters and is read-only, drive-absolute-but-directory-relative paths are simply ignored (they are a dumb idea anyway). Seperator characters are never exposed, all we need now is a mapping from Unicode to whatever the system wants. pathSeparator :: Char The character that seperates directories. So what do I do with this? If I need it, it seems like the module has failed. Indeed. splitFileName bob == (, bob) is not a directory. Some problems just vanish: parseSystemPath bob == Just (Relative (File bob)) splitFileName (Relative (File bob)) = (Relative ThisDirectory, File bob) Windows: splitFileName c: == (c:,) c: is arguably not a directory. parseSystemPath c: == Nothing parseSystemPath c:\ == Absolute (C: :|: ThisDirectory) (Consider that dir c: lists the current directory on c:, not c:\) I'd rather ignore that altogether. Multiple roots with associated current directories are just a needless headache. Even a current directory is somewhat ill-fitted for a functional language like Haskell. getFileName test/ == is not a filename. getFileName (Relative (test :|: ThisDirectory)) == error pattern match failure Also, it looks from this that you treat paths differently depending on whether they end in a separator. Yet this makes no difference to the system. That seems wrong to me. Not to the system, but some programs like to make a difference. If you give rsync a path that doesn't end in a slash, it will take that to mean the directory. With a slash, it means the contents of the directory. The difference is an additional path component that ends up on the target file system or doesn't. getDirectory :: FilePath - FilePath Get the directory name, move up one level. What does this mean, in the presence of dots and symlinks? You're right, this has to be ill-defined. Instead it should be moveUp :: FilePath - IO FilePath which would end up in the parent of the linked-to directory after following a symlink. Cutting of a component is done by simple pattern matching, no special functions needed. Sorry for the rant, but this is Haskell, not Perl. We have true data types, not just strings... Udo. -- A politician is someone who calls a spade a portable, hand-operated digging implement. -- author unknown signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
On Wed, 2006-07-26 at 15:29 +0200, Udo Stenzel wrote: My criticism is that your properties are all specified in terms of string manipulation. Exactly. I believe, a FilePath should be an algebraic datatype. Most operations on that don't have to be specified, because they are simple and have an obvious effect. Add a system specific parser and a system specific renderer, maybe also define a canonical format, and the headaches stop. What's wrong with this? We've had this discussion before. The main problem is that all the current IO functions (readFile, etc) use the FilePath type, which is just a String. So a new path ADT is fine if at the same time we provide a new IO library. That of course is an ongoing discussion in itself. So until we have the opportunity to change the FilePath type there does seem to be value in providing a library that takes some of the complexity and portability nightmares out of using the existing FilePath type. Currently, real programs are doing even less principled hacking with strings. So an easy to use library that we can use now will be a great improvement even if it's not perfect. data FilePath = Absolute RelFilePath | Relative RelFilePath data RelFilePath = ThisDirectory | File String | ParentOf RelFilePath | String :|: RelFilePath parseSystemPath :: String - Maybe FilePath renderSystemPath :: FilePath - String We can even clearly distiguish between the name of a directory in its parent and the directory itself. On Windows, the root directory just contains the drive letters and is read-only, drive-absolute-but-directory-relative paths are simply ignored (they are a dumb idea anyway). Seperator characters are never exposed, all we need now is a mapping from Unicode to whatever the system wants. That's another portability headache - file name string encodings. Windows and OSX use encodings of Unicode. Unix uses strings of bytes. They are not fully inter-convertible. On Unix the traditional technique is to keep a system file name in the original encoding and convert to Unicode to display to the user, but the Unicode version is never converted back to a system file name because it doesn't necessarily convert back to the same sequence of bytes. My point is it's not quite as simple as just making an ADT. (Consider that dir c: lists the current directory on c:, not c:\) I'd rather ignore that altogether. Multiple roots with associated current directories are just a needless headache. Even a current directory is somewhat ill-fitted for a functional language like Haskell. Much of the time it can be ignored. Sometimes programs have to deal with silly issues like this just because that is what the OS does and so you might get such a corner case as input and be expected to deal with it. (Though I admit this is a particularly obscure case.) So in my humble opinion the current discussion on the issues of semantics, names, IO or pure etc is worthwhile. Duncan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: ANN: System.FilePath 0.9
On Jul 26, 2006, at 1:47 PM, Neil Mitchell wrote: Hi, Perhaps instead: directoryOf :: FilePath - String filenameOf :: FilePath - String extensionOf :: FilePath - String basenaneOf :: FilePath - String replaceFilename = joinFilePath . directoryOf replaceDirectory = flip joinFilePath . filenameOf Trying to design a consistent naming system, it helps if we all agree on what the various parts of a filepath are called, this is my draft of that: http://www-users.cs.york.ac.uk/~ndm/temp/filepath.png With a better name for basename, if anyone can think of one. stem, perhaps? You could also, maybe, distinguish the short stem (everything before the extensions) from the long stem everything before the extension. Once we have that, how about takeElement :: FilePath - String dropElement :: FilePath - String replaceElement :: FilePath - String - FilePath addElement :: FilePath - String - FilePath splitElement :: FilePath - (String, String) joinElement :: String - String - FilePath With the restriction that not all of these are provided. Some don't make sense (splitBaseName, dropBaseName), some are implemented via combine (addFileName, joinFileName), some are redundant (addExtensions == addExtension) I'm also debating whether split/join should be exported, since they are less likely to be used and can easily be written as a take/drop pair. And of course, a bigger interface is harder to understand. Opinions on this? It's easier to tweak a specification than the actual code :) Thanks Neil Rob Dockins Speak softly and drive a Sherman tank. Laugh hard; it's a long way to the bank. -- TMBG ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
Duncan Coutts wrote: On Wed, 2006-07-26 at 15:29 +0200, Udo Stenzel wrote: Exactly. I believe, a FilePath should be an algebraic datatype. We've had this discussion before. The main problem is that all the current IO functions (readFile, etc) use the FilePath type, which is just a String. So what's better? - use an ADT (correct and portable by construction), convert to String when calling the IO library - fumble with Strings, use an unholy mix of specialized and general functions, trip over a corner case So a new path ADT is fine if at the same time we provide a new IO library. We should just wrap the old API, filePathToString any parameters and liftIO the function while we're at it. That's another portability headache - file name string encodings. Windows and OSX use encodings of Unicode. Unix uses strings of bytes. Indeed. There are two ways out: - declare that Unix uses Unicode too, take the appropriate conversion from the locale - parameterize the FilePath ADT on the character type, you get (FilePath Word16) on Windows (which uses UCS-2, not UCS-4 and not UTF-16) and (FilePath Word8) on Unix; provide conversions from/to (FilePath String). I tend towards the second option. It at least doesn't make anything worse than it already is. It's also irrelevant, since pretending the issue doesn't exist works equally well with an ADT. My point is it's not quite as simple as just making an ADT. Mine is that it is :) Moreover, a path already has internal structure. Those string manipulating functions either reconstruct the structure, then operate on that, then encode it back into a string or implement an approximation to that. The latter leads to surprises and making the former explicit can never hurt. Heck, NO library fumbles with strings, neither parsers nor pretty printers nor Network... why should a FilePath be different? Udo. signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
On Wed, Jul 26, 2006 at 03:36:13AM +0100, Neil Mitchell wrote: Its a rats nest to do it properly, but some very basic idea of does this path have things which there is no way could possibly be in a file - for example c:\|file is a useful thing to have. This seems to encourage the classic mistake of checking not known bad rather than known good. known bad is rarely useful in my experience. What use case do you have in mind? wget on windows saves web pages such as http://www.google.com/index.html?q=haskell; to the file index.html?q=haskell. This just doesn't work, and is the main reason I added this in. I don't think it will be a commonly used operation. Ok, this is a good use case. What should wget do if isValid fails? Certainly not abort the download. So isValid alone is no help. Well, you have makeValid as well, but this is even more of a rats' nest. There are a zillion different ways you might want this function to work, depending on your purposes. Should makeValid be system-dependent? Should it be reversible? How hard should it try to preserve the name verbatim? Should it prettify legal but unprintable characters? The answers are application-dependent. Also, wget has to worry about not just whether the filename is valid, but whether it currently exists, and in that case modify it. So attempting to provide a generic makeValid is quixotic and will only lead to misuse. My criticism is that your properties are all specified in terms of string manipulation. The whole point of paths is that they are interpreted by the system, so if you neglect to say what your operations mean to the system, what have you specified? True, but at the same time specifying what something means with respect to a filesystem is very hard :) If you had any insight how this could be done I'd be interested. The first step is to think carefully about what operations to provide, and be conservative. I think the operations I included in my library all have pretty clear meanings, though I don't claim to have nailed them down all the way. Criticism welcome. http://haskell.org/pipermail/libraries/2006-February/004890.html Hopefully no one will ever use it. Its part of the low level functions that the FilePath module builds on. However, pragmatically, someone somewhere will have a use for it, and the second they do they'll just write '/', and at that point we've lost. Yes, on one hand you want to be pragmatic. But IMO this way of thinking--expose the guts just in case--is the path to madness. Not to mention, it clutters the API and makes it less clear how the module is supposed to be used. Maybe the guts could go into a separate module? splitFileName :: FilePath - (String, String) Split a filename into directory and file. Which directory and which file? Ok, thats probably the wrong description. Splits off the last filename would be a better description, leaving the rest. Ok, but now what is the rest good for? And what is the last filename in cases like / or ... The conclusion I come to is that this operation is unsound to begin with, and should not be part of the API in any form. Also, it looks from this that you treat paths differently depending on whether they end in a separator. Yet this makes no difference to the system. That seems wrong to me. That was something I thought over quite a while. If the user enters directory/ then they do not mean the file called directory, they mean the directory called directory. And in Windows certainly you can't open a file called file/ Ok, fair, but dir and dir/ are treated identically if dir is a directory, so it is still confusing for your library to distinguish them. Maybe the user needs to indicate whether a path represents a file or directory? These matters confuse your specification. I made the simplifying approximation that foo and foo/ should considered equivalent. This may not turn out to be the right decision, but at least it helped me keep the semantics clear. getDirectory :: FilePath - FilePath Get the directory name, move up one level. What does this mean, in the presence of dots and symlinks? It gets a parent directory, there may be one, but the one returned will be a parent. Is /a a parent of /a/..? That seems dubious. equalFilePath :: FilePath - FilePath - Bool Equality of two FilePaths. If you call fullPath first this has a much better chance of working. Note that this doesn't follow symlinks or DOSNAM~1s. As you acknowledge, it's a crap-shoot. So what's the point? Its a case of reality, at the moment people use == to test if two file paths are equal, at least this is a better test. Why is it better? I think of that as a separate module, because extensions have no meaning to the system and can be done with portable, functional code, as far as I understand. Not really, what about getExtension file.ext\lump - the answer is on windows and .ext\lump on Posix. You would only call the
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
On Wed, Jul 26, 2006 at 03:29:01PM +0200, Udo Stenzel wrote: Andrew Pimlott wrote: Also, it looks from this that you treat paths differently depending on whether they end in a separator. Yet this makes no difference to the system. That seems wrong to me. Not to the system, but some programs like to make a difference. How does it make a difference? Do you have an answer that applies uniformly to all programs? If not, aren't we just walking down a blind alley? I've heard that Emacs treats double-separators specially. Do we account for that too? Maybe the trailing slash is important enough to take into account. But it complicates things, and the problem is hard enough with out it. So I say, leave it out. (In my design, with a different type for different systems, it would be possible to create types for rsync paths, or Emacs paths, etc. That might be a better approach to the problem.) Andrew ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
On Wed, Jul 26, 2006 at 04:34:50PM +0100, Duncan Coutts wrote: On Wed, 2006-07-26 at 15:29 +0200, Udo Stenzel wrote: Exactly. I believe, a FilePath should be an algebraic datatype. Most operations on that don't have to be specified, because they are simple and have an obvious effect. Add a system specific parser and a system specific renderer, maybe also define a canonical format, and the headaches stop. What's wrong with this? We've had this discussion before. The main problem is that all the current IO functions (readFile, etc) use the FilePath type, which is just a String. Geesh, just provide a few wrappers. If we make this a show-stopper, we'll never get there. That's another portability headache - file name string encodings. Windows and OSX use encodings of Unicode. Unix uses strings of bytes. They are not fully inter-convertible. On Unix the traditional technique is to keep a system file name in the original encoding and convert to Unicode to display to the user, but the Unicode version is never converted back to a system file name because it doesn't necessarily convert back to the same sequence of bytes. The only solution for this, IMO, is to provide different types for different systems. Hence my typeclass approach. (Which I'm not saying is good enough to cover for all these differences yet.) Andrew ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
Hi So what's better? - use an ADT (correct and portable by construction), convert to String when calling the IO library - fumble with Strings, use an unholy mix of specialized and general functions, trip over a corner case Or provide an ADT, demand people marshal to and from this ADT and not just cheat and use the string directly? Unfortunately people are lazy, I am one of them... We should just wrap the old API, filePathToString any parameters and liftIO the function while we're at it. How about class FilePathLike a where getRealFilePath :: a - String Then convert readFile etc. to take a FilePathLike, rather than a filepath? I'd be happy with that, and then you can write an ADT and pin down all the exact details, and the end user can then pick whatever they want to use. - declare that Unix uses Unicode too, take the appropriate conversion from the locale Unfortunately this is wrong, and will give the wrong answers. - parameterize the FilePath ADT on the character type, you get (FilePath Word16) on Windows (which uses UCS-2, not UCS-4 and not UTF-16) and (FilePath Word8) on Unix; provide conversions from/to (FilePath String). Windows doesn't use UTF-16, NTFS does. FAT doesn't. And what about the Samba drive I have mounted under Windows? Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
Hi Ok, this is a good use case. What should wget do if isValid fails? isValid (makeValid x) == True makeValid is system dependant, and unspecified in its behaviour, although obviously some kind of closeness to the original would be ideal. So what if isValid fails and we don't have this? As you acknowledge, it's a crap-shoot. So what's the point? Its a case of reality, at the moment people use == to test if two file paths are equal, at least this is a better test. Why is it better? Because its right more often, so pragmatically, is better I think of that as a separate module, because extensions have no meaning to the system and can be done with portable, functional code, as far as I understand. Not really, what about getExtension file.ext\lump - the answer is on windows and .ext\lump on Posix. You would only call the extension functions on a segment name. So this system independant extension module is dependant on a platform specific FilePath module? Or do you demand people make two function calls to get the extension? I think having extensions in this module is the pragmatic and useful thing to do. Not to the system, but some programs like to make a difference. How does it make a difference? Do you have an answer that applies uniformly to all programs? If not, aren't we just walking down a blind alley? I've heard that Emacs treats double-separators specially. Do we account for that too? Haskell makes the difference, runInteractiveCommand vs runInteractiveProcess Maybe the trailing slash is important enough to take into account. But it complicates things, and the problem is hard enough with out it. So I say, leave it out. Originally I left it out, writing quick check properties persudaded me to put it back in, because it seems to make things more regular. But I'm not massively tied to this, and I'm slowly thinking I might be wrong, although not convinced either way yet. Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
On Wed, 2006-07-26 at 19:41 +0200, Udo Stenzel wrote: Duncan Coutts wrote: On Wed, 2006-07-26 at 15:29 +0200, Udo Stenzel wrote: Exactly. I believe, a FilePath should be an algebraic datatype. We've had this discussion before. The main problem is that all the current IO functions (readFile, etc) use the FilePath type, which is just a String. So what's better? - use an ADT (correct and portable by construction), convert to String when calling the IO library - fumble with Strings, use an unholy mix of specialized and general functions, trip over a corner case In practise in the short term, the choice is between each application fumbling with strings in different incorrect ways or a library that fumbles with strings in a rather more considered and portable way. So a new path ADT is fine if at the same time we provide a new IO library. We should just wrap the old API, filePathToString any parameters and liftIO the function while we're at it. Try proposing something concrete and see if you can get it generally accepted. Perhaps you can get it accepted for the next major release of various Haskell implementations or for Haskell-prime. That's another portability headache - file name string encodings. Windows and OSX use encodings of Unicode. Unix uses strings of bytes. Indeed. There are two ways out: - declare that Unix uses Unicode too, take the appropriate conversion from the locale Sadly this does not work. For one thing you don't know that the locale you're using now was the locale of the program that wrote the file. This happens on multi-user systems where different users use different languages. Then there is the fact that converting from Unicode back to the file name is not guaranteed to give the same sequence of bytes. For example, see the section File Name Encodings in the glib api: http://developer.gnome.org/doc/API/2.0/glib/glib-Character-Set-Conversion.html - parameterize the FilePath ADT on the character type, you get (FilePath Word16) on Windows (which uses UCS-2, not UCS-4 and not UTF-16) and (FilePath Word8) on Unix; provide conversions from/to (FilePath String). I tend towards the second option. It at least doesn't make anything worse than it already is. It's also irrelevant, since pretending the issue doesn't exist works equally well with an ADT. Yeah, keeping it in the native format and doing no change of encoding is almost certainly the way to go. It doesn't address the issue of converting file names to/from displayable strings, but perhaps that's reasonable. My point is it's not quite as simple as just making an ADT. Mine is that it is :) Moreover, a path already has internal structure. Those string manipulating functions either reconstruct the structure, then operate on that, then encode it back into a string or implement an approximation to that. The latter leads to surprises and making the former explicit can never hurt. Heck, NO library fumbles with strings, neither parsers nor pretty printers nor Network... why should a FilePath be different? For compatibility with the Haskell98 IO library. There's also the issue here that adding in lots of conversions ADT - String means that people will not bother to use it and will continue to do things like: readFile (path ++ / ++ file) If anyone can actually design and implement an ADT that addresses most of these problems and can get it to work nicely with whatever is the popular IO system of the time then that'd be great. I think you'll find that it's not quite as simple as it looks. There was a discussion on a path ADT on the libraries list a while ago that's probably worth reading. I don't think it reached consensus. Duncan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
On Wed, 2006-07-26 at 11:32 -0700, Andrew Pimlott wrote: On Wed, Jul 26, 2006 at 03:36:13AM +0100, Neil Mitchell wrote: Its a rats nest to do it properly, but some very basic idea of does this path have things which there is no way could possibly be in a file - for example c:\|file is a useful thing to have. This seems to encourage the classic mistake of checking not known bad rather than known good. known bad is rarely useful in my experience. What use case do you have in mind? wget on windows saves web pages such as http://www.google.com/index.html?q=haskell; to the file index.html?q=haskell. This just doesn't work, and is the main reason I added this in. I don't think it will be a commonly used operation. Ok, this is a good use case. What should wget do if isValid fails? Certainly not abort the download. So isValid alone is no help. Well, you have makeValid as well, but this is even more of a rats' nest. There are a zillion different ways you might want this function to work, depending on your purposes. Should makeValid be system-dependent? Should it be reversible? How hard should it try to preserve the name verbatim? Should it prettify legal but unprintable characters? The answers are application-dependent. Also, wget has to worry about not just whether the filename is valid, but whether it currently exists, and in that case modify it. So attempting to provide a generic makeValid is quixotic and will only lead to misuse. Perhaps we should be more specific and make it talk about illegal file name characters if that is indeed the use case. Perhaps we should provide a system-dependent list of characters that are not allowed in file names. For example, on windows that would include '?'. Then an application can decide for itself what to do about that depending on the context. It might be able to tell the user to pick a different name, or in the wget case replace it with a different character or remove it or something. So maybe we should keep isValid but specify exactly what it checks. Then if it fails it's up to the application to decide how to fix it, possibly making use of the list of illegal characters. Duncan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
On Wed, Jul 26, 2006 at 08:19:47PM +0100, Neil Mitchell wrote: Ok, this is a good use case. What should wget do if isValid fails? makeValid is system dependant, and unspecified in its behaviour, although obviously some kind of closeness to the original would be ideal. So what if isValid fails and we don't have this? Sorry, I meant to say what I think wget should do. IMO, it should have a conservative set of allowed characters, encode the filename into that set using an escaping mechanism it specifies, attempt to open the file O_EXCL, modify the name if it fails. The allowed characters set could perhaps come from the filepath module, though I suspect this is overkill. Simpler just to hard-code the set so that the name mangling is platform-independent and can be fully documented. As you acknowledge, it's a crap-shoot. So what's the point? Its a case of reality, at the moment people use == to test if two file paths are equal, at least this is a better test. Why is it better? Because its right more often, so pragmatically, is better To me, that answer is unsatisfactory. I think of that as a separate module, because extensions have no meaning to the system and can be done with portable, functional code, as far as I understand. Not really, what about getExtension file.ext\lump - the answer is on windows and .ext\lump on Posix. You would only call the extension functions on a segment name. So this system independant extension module is dependant on a platform specific FilePath module? Or do you demand people make two function calls to get the extension? One or the other, it seems a minor detail to me. There's nothing wrong with having the extension module use the filepath module. Not to the system, but some programs like to make a difference. How does it make a difference? Do you have an answer that applies uniformly to all programs? If not, aren't we just walking down a blind alley? I've heard that Emacs treats double-separators specially. Do we account for that too? Haskell makes the difference, runInteractiveCommand vs runInteractiveProcess I'm not following. Andrew ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
On Wed, Jul 26, 2006 at 11:39:40AM -0700, Andrew Pimlott wrote: On Wed, Jul 26, 2006 at 03:29:01PM +0200, Udo Stenzel wrote: Andrew Pimlott wrote: Also, it looks from this that you treat paths differently depending on whether they end in a separator. Yet this makes no difference to the system. That seems wrong to me. Not to the system, but some programs like to make a difference. How does it make a difference? Do you have an answer that applies uniformly to all programs? If not, aren't we just walking down a blind alley? I've heard that Emacs treats double-separators specially. Do we account for that too? cp(1), for example, treats paths with trailing separators differently from paths without. rm -rf foo bar echo test foo echo othertest bar cp foo bar/ cp foo bar It's part of the user interface, that allows the user to specify that he or she intends to use a path to describe a directory. This doesn't apply uniformly to all programs--except that we can say that any path with a trailing '/' is intended to be a directory, and if it's not, then that's an error. But the trouble is that if you silently drop the '/', then the only way for me to implement a correct cp(1) in Haskell is to not use your proposed interface for pathname handling, which drops this information. I'd also point out that rieser4, for instance, treats paths with a trailing slash differently even for files. True, it's probably not a good idea, but if we're talking about a portable library we might want it to work even on systems running an interesting filesystem like rieser4. -- David Roundy ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
Hi re: isValid Perhaps we should be more specific and make it talk about illegal file name characters if that is indeed the use case. Perhaps we should provide a system-dependent list of characters that are not allowed in file names. For example, on windows that would include '?'. Then an application can decide for itself what to do about that depending on the context. It might be able to tell the user to pick a different name, or in the wget case replace it with a different character or remove it or something. Unfortunately thats too much work for the user of the API. Since it tends to work on Posix, people probably won't go to the hassle of fixing it up. If they have a simple fix then there is at least a chance that they'll accept a patch that fixes the behaviour on Windows. The only options that I can see are either you want it fixed, or you want to get the user to fix it manually - both are catered for. And on Windows its more complex, LPT1.txt is also an invalid file, but LPT1.txt.txt isn't. Trying to express the weirdness of Windows is probably beyond the chances of an API :) So maybe we should keep isValid but specify exactly what it checks. I'm happy to specify things in more detail, at the moment its pretty much a no-op on Posix, but if any Posix user suggests thats wrong I'll happily fix it up. Sorry, I meant to say what I think wget should do. IMO, it should have a conservative set of allowed characters, encode the filename into that Not enough, because of the LPT1 issue - unless you add L as a disallowed letter :) Haskell makes the difference, runInteractiveCommand vs runInteractiveProcess I'm not following. Having some considerations towards a real path, one that can be used on the command is reasonable, I think, because Haskell has functions within it that distinguish between firing something at the underlying filepath vs at the console. I don't however think its worth having a special type for working with emacs, unless you have System.FilePath.Emacs, given that Emacs is almost an operating system :) But the trouble is that if you silently drop the '/', then the only way for me to implement a correct cp(1) in Haskell is to not use your proposed interface for pathname handling, which drops this information. Ok, now I remember the reasons I kept the trailing slash, I'll leave it in. Esp the risers4 issue. Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
Andrew Pimlott wrote: Maybe the trailing slash is important enough to take into account. No, not the trailing slash. The difference between a directory and its contents is important enough. This is ususally encoded using a trailing slash, but I'd rather not worry about that detail in a program. What does Emacs do with double separators? I'm at a loss thinking of anything they could denote, but it could be useful. Udo. -- Guy Steele leads a small team of researchers in Burlington, Massachusetts, who are taking on an _enormous_challenge_ -- create a programming language better than Java. -- Sun.Com signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
On Wed, Jul 26, 2006 at 10:32:02PM +0100, Neil Mitchell wrote: Sorry, I meant to say what I think wget should do. IMO, it should have a conservative set of allowed characters, encode the filename into that Not enough, because of the LPT1 issue - unless you add L as a disallowed letter :) hahaha! I admit I don't know enough to say how the lpt1 issue should be handled. Is there any Win32 call I can make that will help me avoid accidentally opening these magic files? Say, if I call open with O_CREAT | O_EXCL? Unfortunately, I can find very little information on how one should handle this issue. BTW, it appears that wget itself does not handle it. :-) Incidentally, there seems to be another problem: The System.IO API provides no way to create a file, failing if it already exists (ie, O_CREAT | O_EXCL). This is exactly what wget needs. BTW, I guess wget should truncate the path at some number of characters Having some considerations towards a real path, one that can be used on the command is reasonable That's a great goal, it's just that we have to draw the boundary somewhere. At some point, you have be explicit that this is a path for rsync or Emacs or whatever. Andrew ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
Udo Stenzel [EMAIL PROTECTED] writes: No, not the trailing slash. The difference between a directory and its contents is important enough. This is ususally encoded using a trailing slash, but I'd rather not worry about that detail in a program. What does Emacs do with double separators? I'm at a loss thinking of anything they could denote, but it could be useful. Double separators invoke ange-ftp mode from find-file. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
At Wed, 26 Jul 2006 23:07:36 +0200, Udo Stenzel wrote: What does Emacs do with double separators? I'm at a loss thinking of anything they could denote, but it could be useful. You mean like, /path/to/somewhere//with/double/seperator If so, it treats it as if you had typed in: /with/double/seperator That can be useful if you do C-x C-f and you wish to ignore the default path it brings up. (Of course, it is only one character more to hit M-DEL). j. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
First, http://java.sun.com/j2se/1.5.0/docs/api/java/io/File.html may be useful to compare to. I learned C:\ is an absolute and C: is a relative path. What are the use cases? I can see four different types of file paths that one will want to manipulate. The following are all pure functions: (Main) Local system compatibility: I must understand all the local files 1 create roots of the local filesystem(s) (e.g. C:\) 2 remove the last file or path element (easy except for symlinks). 3 create a new file or directory name at the end of a path and know that this is valid for the local filesystem (i.e. has no invalid or unencodable characters). 4 Make a way to get the local list of invalid characters 5 Provide an isValid test (and maybe a rootValid test) 6 Provide an invalidToSuggestedValid function that applies some policy for ensuring new paths can be coerced into a valid form. (Aside from LPT1 insanity?) The URL solution of %FF from rfc1630 could be used. 7 parsing the names of paths and files with respect to the local system, so you can handle a directory listing. (Unicode problems go here) (Secondary) Maximal cross platform compatibility: I want mine to work everywhere Provide 2-6 but for a conservative union of all bad characters. Handling the roots would be trickier. (Tertiary) Specific platform compatibility: I want to work with platform Foo Provide 2-6 but for the platform the user specifies. This may or may not be the current platform. (Special) Handle conversion to and from file:// URI's The (Secondary) could be accomplished as a special case of (Tertiary) by specifying the platform Most for instance. The (Main) could be accomplished as a special case of (Tertiary) by specifying the platform Local. None of the above depend on IO. None of the above really care about String vs ADT. The only one that truly and deeply cares about character set encoding is #7 on the local system. Mainly, the above just provides sets of invalid characters. A makeCanonical pure function could remove . and .. in the syntactic way. But I can't see what else it could do without IO. Any IO based function can only be part of the (Main) Local system compatibility domain of operations. And the guarantees are weak due to race conditions. E.g. the makeCanonical_IO is a fancier operation that removes . and .. based on symlinks and upper/local case matching based on what is in the filesystem. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
[Sorry for the late reply.] On Wed, Jul 19, 2006 at 03:16:48AM +0100, Neil Mitchell wrote: I want to make sure a filename is valid. For example, prn and con This is another rat's nest, so I suggest that it be dealt with separately from the basic filepath module. The notion of valid is squishy: It depends entirely on what you intend to do with the path. Its a rats nest to do it properly, but some very basic idea of does this path have things which there is no way could possibly be in a file - for example c:\|file is a useful thing to have. This seems to encourage the classic mistake of checking not known bad rather than known good. known bad is rarely useful in my experience. What use case do you have in mind? In this library proposal, there are a bunch of xxxDrive functions that many Unix-oriented programmers are sure to ignore because they are no-ops on Unixy systems. Even on Windows, they are not very useful: I strongly agree about this. The temptation in path modules seems to be to throw in everything you can think of (without specifying any of it precisely), just in case someone finds it useful. The drive functions stand on their own as a chunk, and are possibly not well suited to a Posix system, but are critical for a Windows system. Why are they critical for portable code? I am fine with Windows-specific functions, but I think it's a mistake to bundle them portable functions. (In my design, I have separate types for Windows and Unix paths, and imagine full support for Windows-specific operations, but only on the Windows type.) I have tried to specify the functions precisely, and I use this specification as a test suite. Currently there are 114 properties in this test suite, all can be seen on the haddock documentation. If you consider any function to be ambiguously specified, please say which one and I'll add extra tests until it gives you no suprises at all. My criticism is that your properties are all specified in terms of string manipulation. The whole point of paths is that they are interpreted by the system, so if you neglect to say what your operations mean to the system, what have you specified? Here are some specific cases I take issue with. (Quotes are from your generated docs.) Sorry if I seem to be piling it on, but I think these matters are important for a good path library. pathSeparator :: Char The character that seperates directories. So what do I do with this? If I need it, it seems like the module has failed. splitFileName :: FilePath - (String, String) Split a filename into directory and file. Which directory and which file? splitFileName bob == (, bob) is not a directory. Windows: splitFileName c: == (c:,) c: is arguably not a directory. (Consider that dir c: lists the current directory on c:, not c:\) getFileName test/ == is not a filename. Also, it looks from this that you treat paths differently depending on whether they end in a separator. Yet this makes no difference to the system. That seems wrong to me. setFileName :: FilePath - String - FilePath Set the filename. This is vague to me. Eg, what does it do with /, which has no filename? getDirectory :: FilePath - FilePath Get the directory name, move up one level. What does this mean, in the presence of dots and symlinks? normalise :: FilePath - FilePath Normalise a file As Simon asked, when is this safe to use? equalFilePath :: FilePath - FilePath - Bool Equality of two FilePaths. If you call fullPath first this has a much better chance of working. Note that this doesn't follow symlinks or DOSNAM~1s. As you acknowledge, it's a crap-shoot. So what's the point? isValid :: FilePath - Bool Is a FilePath valid, i.e. could you create a file like it? There are a whole host of reasons you might not be able to create a file. Which ones does this address? I tried to export a minimal set of operations that seem to me sufficient for everything not very platform-specific (though I am interested in counterexamples): Anything to do with file extensions? Its also important (I feel) for people to have easy access to common operations, but I guess that is a design decision. I think of that as a separate module, because extensions have no meaning to the system and can be done with portable, functional code, as far as I understand. Andrew ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
Hi Its a rats nest to do it properly, but some very basic idea of does this path have things which there is no way could possibly be in a file - for example c:\|file is a useful thing to have. This seems to encourage the classic mistake of checking not known bad rather than known good. known bad is rarely useful in my experience. What use case do you have in mind? wget on windows saves web pages such as http://www.google.com/index.html?q=haskell; to the file index.html?q=haskell. This just doesn't work, and is the main reason I added this in. I don't think it will be a commonly used operation. The drive functions stand on their own as a chunk, and are possibly Why are they critical for portable code? I am fine with Windows-specific functions, but I think it's a mistake to bundle them portable functions. I agree, and have now removed them. My criticism is that your properties are all specified in terms of string manipulation. The whole point of paths is that they are interpreted by the system, so if you neglect to say what your operations mean to the system, what have you specified? True, but at the same time specifying what something means with respect to a filesystem is very hard :) If you had any insight how this could be done I'd be interested. pathSeparator :: Char The character that seperates directories. So what do I do with this? If I need it, it seems like the module has failed. Hopefully no one will ever use it. Its part of the low level functions that the FilePath module builds on. However, pragmatically, someone somewhere will have a use for it, and the second they do they'll just write '/', and at that point we've lost. splitFileName :: FilePath - (String, String) Split a filename into directory and file. Which directory and which file? Ok, thats probably the wrong description. Splits off the last filename would be a better description, leaving the rest. splitFileName bob == (, bob) is not a directory No, its the rest in this context. Windows: splitFileName c: == (c:,) c: is arguably not a directory. (Consider that dir c: lists the current directory on c:, not c:\) Its a bit weird on Windows, but certainly c: isn't a FileName, so thats the reason for this decision. getFileName test/ == is not a filename. But test/ is certainly not a file. Also, it looks from this that you treat paths differently depending on whether they end in a separator. Yet this makes no difference to the system. That seems wrong to me. That was something I thought over quite a while. If the user enters directory/ then they do not mean the file called directory, they mean the directory called directory. And in Windows certainly you can't open a file called file/ setFileName :: FilePath - String - FilePath Set the filename. This is vague to me. Eg, what does it do with /, which has no filename? / as the second element? I guess its calling it out of spec if you use anything but a valid filename as the second argument, and the behaviour is undefined. If you do need to do something like that, then combine is the function. getDirectory :: FilePath - FilePath Get the directory name, move up one level. What does this mean, in the presence of dots and symlinks? It gets a parent directory, there may be one, but the one returned will be a parent. normalise :: FilePath - FilePath Normalise a file As Simon asked, when is this safe to use? Let me think, and then work on it so the answer is always. equalFilePath :: FilePath - FilePath - Bool Equality of two FilePaths. If you call fullPath first this has a much better chance of working. Note that this doesn't follow symlinks or DOSNAM~1s. As you acknowledge, it's a crap-shoot. So what's the point? Its a case of reality, at the moment people use == to test if two file paths are equal, at least this is a better test. isValid :: FilePath - Bool Is a FilePath valid, i.e. could you create a file like it? There are a whole host of reasons you might not be able to create a file. Which ones does this address? I have added documentation which hopefully shows exactly what it tries to address. I think of that as a separate module, because extensions have no meaning to the system and can be done with portable, functional code, as far as I understand. Not really, what about getExtension file.ext\lump - the answer is on windows and .ext\lump on Posix. This library isn't just a portability layer (although it does encompass that), its mainly meant to make the things people do with filepaths easier, and by seducing them with ease of use, subtly tack in cross platform portability. Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: ANN: System.FilePath 0.9
Neil Mitchell wrote: And if someone wants to define a new and better FilePath type, I would prefer something more abstract, such as a list of Path components, with functions to serialize it as a String and to parse it from a String. A list of path components is just not enough, I'm afraid. What about extensions? What about drives? If you want an abstract type it will probably need to be entirely abstract, rather than with some exposed structure. Why not just delete Unix and Windows from the equation altogether, and define a simple Haskell file system with something like: newtype Path a = Path [a] newtype Filename a = Filename a data Origin a -- some abstract type deriving Eq -- this would be nice if it is possible to implement data IString a = FileSpecifier a = FileSpecifier !(Origin a) !(Path a) !(Filename a) instance IString ByteString.Char8 ... instance IString String ... Origins could be created by a factory appropriate to the underlying operating system (they would represent drives or volumes or mount points) - in any case a drive can't be mentioned in a program or the program wouldn't be portable! Athough even with a nice rational reconstruction the monstrously unfortunate fact remains that Windows is case insensitive (how impossibly moronic!!!) and Unix isn't so it is not possible to write code that will work the same for both OS's if one is required to use filenames that will look the same in other OS apps (ie the trick of encoding the complete Unicode char set in terms of legal filename chars is probably not acceptable). Anyway this is probably straying too far from what you are trying to do at the moment. Regards, Brian. -- Logic empowers us and Love gives us purpose. Yet still phantoms restless for eras long past, congealed in the present in unthought forms, strive mightily unseen to destroy us. http://www.metamilk.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: ANN: System.FilePath 0.9
Neil Mitchell writes: We should avoid referring to $PATH as the path, since we already have FilePath. Agreed, but I couldn't come up with a better name, if anyone has any suggestions. searchPath? -- David Menendez [EMAIL PROTECTED] http://www.eyrie.org/~zednenem/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
Hi I want to make sure a filename is valid. For example, prn and con This is another rat's nest, so I suggest that it be dealt with separately from the basic filepath module. The notion of valid is squishy: It depends entirely on what you intend to do with the path. Its a rats nest to do it properly, but some very basic idea of does this path have things which there is no way could possibly be in a file - for example c:\|file is a useful thing to have. By making it pure, there is no risk of the result being different. I see the isValid guarantee more as a False means it definately isn't valid, rather than the other way round. In this library proposal, there are a bunch of xxxDrive functions that many Unix-oriented programmers are sure to ignore because they are no-ops on Unixy systems. Even on Windows, they are not very useful: I strongly agree about this. The temptation in path modules seems to be to throw in everything you can think of (without specifying any of it precisely), just in case someone finds it useful. The drive functions stand on their own as a chunk, and are possibly not well suited to a Posix system, but are critical for a Windows system. Ignoring these, which would you consider worthy of removal? Some are strictly redundant, but quite useful - for example isAbsolute/isRelative which are the negation of each other. I have tried to specify the functions precisely, and I use this specification as a test suite. Currently there are 114 properties in this test suite, all can be seen on the haddock documentation. If you consider any function to be ambiguously specified, please say which one and I'll add extra tests until it gives you no suprises at all. QuickCheck rules :) I tried to export a minimal set of operations that seem to me sufficient for everything not very platform-specific (though I am interested in counterexamples): Anything to do with file extensions? Its also important (I feel) for people to have easy access to common operations, but I guess that is a design decision. Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
Hi, In this library proposal, there are a bunch of xxxDrive functions .. [remove them] I strongly agree about this. I have decided you are right, on Windows getDrive x can be written simply as: getDrive x | isRelative x = | otherwise = head (getDirectories x) And given that people probably shouldn't be playing with drives anyway, if they do want to, they can do a bit more work. All the drive related functions and therefore removed from the interface. I have also added a canonicalPath function, support for spotting file\con as invalid and fixing it, support for \\?\ paths (if you don't know what they are, don't look it up, they are quite painful!) and a few very obscure corner cases which broke some of the properties. Anyone have another other thoughts or comments? Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: ANN: System.FilePath 0.9
On 18/07/06, Stephane Bortzmeyer [EMAIL PROTECTED] wrote: For instance, many lazy (not in the Haskell meaning) programmers believe that the path is safe if it does not include .. but it is false (hint: ../foo/bar is a legal path on Unix). I believe this does not cause trouble. If it is a shell expression, it will go one level up. However, when treated as a filesystem path alone it will stay beneath. After all, the filesystem does not interpret quotation marks. Regards, Piotr Kalinowski -- Intelligence is like a river: the deeper it is, the less noise it makes ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: ANN: System.FilePath 0.9
Stephane Bortzmeyer wrote: On Mon, Jul 17, 2006 at 03:07:51AM +0100, Neil Mitchell [EMAIL PROTECTED] wrote a message of 64 lines which said: How about adding something like restrictFilePaths :: FilePath - IO () which will restrict the area that can be played with to that beneath the given FilePath? If someone does so, be aware that it is *not* trivial to write it securely. For instance, many lazy (not in the Haskell meaning) programmers believe that the path is safe if it does not include .. but it is false (hint: ../foo/bar is a legal path on Unix). That is a legal path if your Haskell program invokes (perhaps indirectly) a Unix shell. But if you can inject strings into a shell invocation then it is obviously impossible to do anything about limiting it to be weaker than the IO monad. -- Chris ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
On Sun, Jul 16, 2006 at 08:43:31PM -0500, Brian Smith wrote: I kind of expect that a Haskell library for file paths will use the type system to ensure some useful properties about the paths. It's a nice idea, but I claim that it's a rat's nest. Path semantics, when you look hard at them, are too vague, confusing, and subtle to encode usefully in types. And I think there's a better way to do what you're asking for. For example, when writing security-conscious software I often want to be able to distinguish between absolute, ascending (paths with leading ../ and similar items), and decending paths (paths that contain no ../). My suggestion is to specify your own syntax and semantics for the input to your software, which I assume is coming in over the network or some other trust boundary. By resisting the temptation to piggy-back on native paths, you control what your paths mean, instead of leaving it to the system. Further, for many applications, users don't really care that their paths map to filesystem paths. If you keep them separate, you can even change your storage from the filesystem to something else. In your case, either define your path syntax not to allow .., or define your own simple normalization rules, and apply them before you try to combine a user-supplied path with a native system path. Eg, the user gives ../a/../b/c/d/.., and you either reject it or turn it into /b/c, and then append it to your root, eg /root/a/b. Of course, you might make further restrictions in your paths, like only allowing letters, etc. I want to make sure a filename is valid. For example, prn and con are not valid path elements for normal files on Windows, certain characters are not allowed in filenames (depending on platform), some platforms may require paths to be escaped in different ways. I see there is a isValid function and even a (magical) makeValid function, but they do not report what was wrong with the filename in the first place. Furthermore, it isn't clear from the documentation how these functions determine whether a filename is valid. This is another rat's nest, so I suggest that it be dealt with separately from the basic filepath module. The notion of valid is squishy: It depends entirely on what you intend to do with the path. There are many cases to consider: on Linux, which characters are allowed depends on the filesystem type, and special files may appear anywhere and have any name--the only way to test for them is by doing IO. Oh, and who knows if the situation might change between when you call isValid and when you actually perform the operation? IMO, safety is the most important issue regarding file paths and it is not addressed in this library as far as I can see. Writing code to handle these issues is tedious, error-prone, and boring to write despite being critical. It isn't the kind of code that you want to just download off of some guy's webpage. Basically, it is exactly the type of thing that belongs in a standard library. My approach is not to take a filepath and say, is it safe? (which can't be meaningfully answered in general anyway), but to construct paths in a careful manner that is safe for your application. In this library proposal, there are a bunch of xxxDrive functions that many Unix-oriented programmers are sure to ignore because they are no-ops on Unixy systems. Even on Windows, they are not very useful: I strongly agree about this. The temptation in path modules seems to be to throw in everything you can think of (without specifying any of it precisely), just in case someone finds it useful. I posted a more minimalist module a while back: http://haskell.org/pipermail/libraries/2006-February/004890.html I tried to export a minimal set of operations that seem to me sufficient for everything not very platform-specific (though I am interested in counterexamples): currentPath :: p prefixes :: p - [(p, ChildName)] addChild :: Monad m = p - ChildName - m p append :: Monad m = p - p - m p getChildren :: p - IO [p] canonicalize :: p - IO p See the referenced message for explanation. Andrew ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
On Mon, Jul 17, 2006 at 03:07:51AM +0100, Neil Mitchell wrote: Hi Brian, I kind of expect that a Haskell library for file paths will use the type system to ensure some useful properties about the paths. I am specificially concentrating on type FilePath = String, because that is how it is defined by Haskell 98. And consequently that's how it works with readFile/writeFile/appendFile etc. Perhaps a far better solution to this would not be to hack these kind of guarantees in at the filepath level, but have a restricted IO monad that only lets you perform operations inside certain directories, or only lets you read/write files. I know that both House and Halfs use these techniques. Without too much effort Yhc (for example) could be modified to perform restricted IO operations (only on certain directories etc). You seem to want to distinguish between relative, relative down only and absolute paths. By putting this in the filepath, and having different types for each, you pretty much guarantee that all standard functions will operate on all 3 types of path, so you don't gain any security that way, since mistakes will still slip through. How about adding something like restrictFilePaths :: FilePath - IO () which will restrict the area that can be played with to that beneath the given FilePath? Darcs also does something similar (typeclasses for control of IO actions), and this is certainly the way to go. However, I also agree that type distinctions between paths would be nice. My preference has long been that the FilePath should be a class rather than a type. Then one could have single IO functions that accept restricted and unrestricted file paths, and other ones that accept only restricted file paths, so you could get compile-time checking that your safe chroot monad won't die at runtime. -- David Roundy ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
Hi Neil,On 7/17/06, Neil Mitchell [EMAIL PROTECTED] wrote: Hi Brian,You sent this email just to me, and not to the list. If you indendedto send to the list then feel free to forward my bits on to the list. I know that FilePath is defined by Haskell '98 as a String and so it cannot be changed. So, perhaps a new type or class should be created for this library (hereafter GoodPath, although I am not suggesting that is the best name).The problem is people will have to marshal their data into this GoodPath, and marshal it out again. When people can shortcut thatmarshalling, as the current readFile/writeFile definitions ensure theycan, they will. At that point you loose all safety because people will abuse it.I disagree. It would be trivial to create a new module that exported new definitions of file IO actions that operated on GoodPath instead of FilePath, transparently delegating to the original readFile/writeFile/etc. until they could be removed in the future. This would also support the SuperFilePath idea you mentioned. Another thing I thought of would be a canonicalPath IO action (canonicalPath :: FilePath - IO FilePath) that returns a FilePath that implements case-preserving-case-insensitive matching. For example, if there is a file named Hello There.txt in C:\, then(canonicalPath c:\hello there.txt ) would give C:\Hello There.txt).I think that the xxxDrive functions should only be exported from System.FilePath.Windows and no System.FilePath since it is unclear as to how they should be used effectively by cross-platform software.- Brian ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: ANN: System.FilePath 0.9
Hi I disagree. It would be trivial to create a new module that exported new definitions of file IO actions that operated on GoodPath instead of FilePath, transparently delegating to the original readFile/writeFile/etc. until they could be removed in the future. This would also support the SuperFilePath idea you mentioned. Yes it would, but because readFile etc. are in the prelude its not easy to not have them included. If someone was to write a System.SuperFilePath module and an IO.SuperFilePath module that would be great! I have considered it myself, but unfortunately don't have enough time, at the moment. The advantage of moving to FilePath now is that its entirely non-breaking for anything, and once we have SuperFilePath, it makes it easier to migrate because (hopefully!) there will be less functions proding directly at FilePath's as strings. Another thing I thought of would be a canonicalPath IO action (canonicalPath :: FilePath - IO FilePath) that returns a FilePath that implements case-preserving-case-insensitive matching. For example, if there is a file named Hello There.txt in C:\, then (canonicalPath c:\hello there.txt ) would give C:\Hello There.txt). Yes, thats a really good idea - and in fact when I wrote a FilePath module for Visual Basic (a long long time ago), I had such a function in it. I will make sure I add that tomorrow. I think that the xxxDrive functions should only be exported from System.FilePath.Windows and no System.FilePath since it is unclear as to how they should be used effectively by cross-platform software. I would say they shouldn't be used at all, but it is true that Posix.setDrive c: is a bit poorly defined. I will think this idea over, maybe the drive functions shouldn't be exported under either the general one or under the Posix, but it breaks a nice symetry that the library has... I have added a wiki page discussing System.FilePath, http://haskell.org/haskellwiki/FilePath, which is more a personal todo list, but if people want to summarise/propose things then feel free :) Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe