Re: Filename literals
On 2009-Aug-18, at 7:20 am, Timothy S. Nelson wrote: On Tue, 18 Aug 2009, David Green wrote: Some ways in which different paths can be considered equivalent: Spelling: ... Simplification: ... Resolution: ... Content-wise: ... Ok, my next commit will have canonpath (stolen directly from p5's File::Spec documentation), which will do No physical check on the filesystem, but a logical cleanup of a path, and realpath (idea taken from p5's Cwd documentation), which will resolve symlinks, etc, and provide an absolute path. Oh, and resolvepath, which does both. I'm not quite sure I followed all your discussion above -- have I left something out? I think there's a difference between canonical as in a webpage with link rel=canonical, and cleanup as in Windows turning PROGRA~1 into Program Files. There could also be other types of normalisation depending on the FS, but we probably shouldn't concern ourselves with them, other than having some way to get to such native calls. Anyway, my assumption is that there should be a number of comparison options. Since we do Str, we should get string comparison for free. But I'm expecting other options at other levels, but have no idea how or what at this point. As Leon Timmermans keeps reminding us, that really should be delegated to the OS/FS. I think $file1 =:= $file2 should ask the OS whether it thinks those are the same item or not (it can check paths, it can check inodes, whatever is its official way to compare file-thingies). Similarly, $file1.name === $file2.name should ask the OS whether it thinks those names mean the same thing. And if you want to compare the canonical paths or anything else, just say $file1.name.canonical === $file2.name.canonical, or use 'eq', or whatever you want to do, just do it explicitly. According to my last commit, p{} will return a Path object that just stores the path, but has methods attached for accessing all the metadata. But it doesn't do file opening or things like that (unless you use the :T and :B thingies, which read the first block and try to guess whether it's text or binary -- these are in Perl 5 too). There are two things going on here: the user-friendly syntax for casual use, which we basically agree should be something short and pithy, although we have but begun to shed this bike, I'm sure. $file = io /foo/bar; $file = p{/foo/bar}; $file = Q:p/foo/bar/; $file = File(/foo/bar); However we end up spelling it, we want that to give us unified access to the separate inside parts: IO::Data# contents of file IO::Handle # filehandle for using manually IO::Metadata IO::Path I'm not sure why Path isn't actually just part of IO::Metadata... maybe it's just handy to have it out on its own because pathnames are so prominent. In any case, $file.size would just be shorthand for something like $file.io.metadata{size}. The :T and :B tests probably ought to be part of IO::Data, since they require opening the file to look at it; I'd rather put them there (vs. ::Metadata, which is all outside info) since plain ol' $file abstracts over that detail anyway. You can say $file.r, $file.x, $file.T, $file.B, and not care where those test live under the hood. We might actually want to distinguish IO::Metadata::Stat from IO::Metadata::Xattr or something... but that's probably too FS- specific. I don't think I mind much whether it's IO::Path or IO::Metadata::Path, or whether they both as exist as synonyms I think we want many of the same things, I'm just expressing them slightly differently. Let's keep working on this, and hopefully we end up with something great. Yes. A great mess! Er, wait, no And there's no perfect solution, but it would be useful for Perl to stick as closely as the FS/OS's idea of types as it can. Sometimes that would mean looking up an extension; it might mean using (or emulating) file magic; it might mean querying the FS for a MIME- type or a UTI. After all, the filename extension may not actually match the correct type of the file. My suggestion would be that it's an interesting idea, but should maybe be left to a module, since it's not a small problem. Of course, I'm happy to be overruled by a higher power :). I'd like the feature, I'm just unsure it deserved core status. Well, it's all modules anyway... certainly we'll have to rely on IO::Filesystem::XXX, but I do think this is another area to defer to the OS's own type-determining functions rather than try to do it all internally. What we should have, though, is a standard way to represent the types in Perl so that users know how to deal with them. I think roles are the obvious choice: if the OS tells you that a file is HTML, then $file would do IO::Datatype::HTML, which means in turn it would also do IO::Datatype::Plaintext, and so on. Of
Re: Filename literals
I don't think $file1.name == $file2.name should talk to the FS, because I think File#name t+r whatever) should return a plain Str. Having magical FilePathName objects is handy, but sometimes you want to get the filename as a dumb string to do stringish things without having to worry about the fact that the string started life as the name of a file somewhere. I could convert it explicitly, but it's not obvious that I need to; 'name' sounds like something that should return Str. On 8/19/09, David Green david.gr...@telus.net wrote: On 2009-Aug-18, at 7:20 am, Timothy S. Nelson wrote: On Tue, 18 Aug 2009, David Green wrote: Some ways in which different paths can be considered equivalent: Spelling: ... Simplification: ... Resolution: ... Content-wise: ... Ok, my next commit will have canonpath (stolen directly from p5's File::Spec documentation), which will do No physical check on the filesystem, but a logical cleanup of a path, and realpath (idea taken from p5's Cwd documentation), which will resolve symlinks, etc, and provide an absolute path. Oh, and resolvepath, which does both. I'm not quite sure I followed all your discussion above -- have I left something out? I think there's a difference between canonical as in a webpage with link rel=canonical, and cleanup as in Windows turning PROGRA~1 into Program Files. There could also be other types of normalisation depending on the FS, but we probably shouldn't concern ourselves with them, other than having some way to get to such native calls. Anyway, my assumption is that there should be a number of comparison options. Since we do Str, we should get string comparison for free. But I'm expecting other options at other levels, but have no idea how or what at this point. As Leon Timmermans keeps reminding us, that really should be delegated to the OS/FS. I think $file1 =:= $file2 should ask the OS whether it thinks those are the same item or not (it can check paths, it can check inodes, whatever is its official way to compare file-thingies). Similarly, $file1.name === $file2.name should ask the OS whether it thinks those names mean the same thing. And if you want to compare the canonical paths or anything else, just say $file1.name.canonical === $file2.name.canonical, or use 'eq', or whatever you want to do, just do it explicitly. According to my last commit, p{} will return a Path object that just stores the path, but has methods attached for accessing all the metadata. But it doesn't do file opening or things like that (unless you use the :T and :B thingies, which read the first block and try to guess whether it's text or binary -- these are in Perl 5 too). There are two things going on here: the user-friendly syntax for casual use, which we basically agree should be something short and pithy, although we have but begun to shed this bike, I'm sure. $file = io /foo/bar; $file = p{/foo/bar}; $file = Q:p/foo/bar/; $file = File(/foo/bar); However we end up spelling it, we want that to give us unified access to the separate inside parts: IO::Data# contents of file IO::Handle # filehandle for using manually IO::Metadata IO::Path I'm not sure why Path isn't actually just part of IO::Metadata... maybe it's just handy to have it out on its own because pathnames are so prominent. In any case, $file.size would just be shorthand for something like $file.io.metadata{size}. The :T and :B tests probably ought to be part of IO::Data, since they require opening the file to look at it; I'd rather put them there (vs. ::Metadata, which is all outside info) since plain ol' $file abstracts over that detail anyway. You can say $file.r, $file.x, $file.T, $file.B, and not care where those test live under the hood. We might actually want to distinguish IO::Metadata::Stat from IO::Metadata::Xattr or something... but that's probably too FS- specific. I don't think I mind much whether it's IO::Path or IO::Metadata::Path, or whether they both as exist as synonyms I think we want many of the same things, I'm just expressing them slightly differently. Let's keep working on this, and hopefully we end up with something great. Yes. A great mess! Er, wait, no And there's no perfect solution, but it would be useful for Perl to stick as closely as the FS/OS's idea of types as it can. Sometimes that would mean looking up an extension; it might mean using (or emulating) file magic; it might mean querying the FS for a MIME- type or a UTI. After all, the filename extension may not actually match the correct type of the file. My suggestion would be that it's an interesting idea, but should maybe be left to a module, since it's not a small problem. Of course, I'm happy to be overruled by a higher power :). I'd like the feature, I'm just unsure it deserved core status. Well, it's all modules
Re: Filename literals
On Wed, 19 Aug 2009, Mark J. Reed wrote: I don't think $file1.name == $file2.name should talk to the FS, because I think File#name t+r whatever) should return a plain Str. Having magical FilePathName objects is handy, but sometimes you want to get the filename as a dumb string to do stringish things without having to worry about the fact that the string started life as the name of a file somewhere. I could convert it explicitly, but it's not obvious that I need to; 'name' sounds like something that should return Str. $file1.name == $file2.name is kinda strange because it does a numeric comparison between the filenames (see S03). Methinks you want $file1 eq $file2 (both of which are assumed to be of type Path) which does a string comparison between them without consulting the filesystem. Having said that, you've made me realise that $file1 == $file2 might be the perfect operator for comparing inodes, since inodes are numbers. :) - | Name: Tim Nelson | Because the Creator is,| | E-mail: wayl...@wayland.id.au| I am | - BEGIN GEEK CODE BLOCK Version 3.12 GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- PE(+) Y+++ PGP-+++ R(+) !tv b++ DI D G+ e++ h! y- -END GEEK CODE BLOCK-
Re: Filename literals
I should've mentioned, though, we're currently using the smartmatch operator for this, so I'm thinking maybe I'll just stick with that. :) - | Name: Tim Nelson | Because the Creator is,| | E-mail: wayl...@wayland.id.au| I am | - BEGIN GEEK CODE BLOCK Version 3.12 GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- PE(+) Y+++ PGP-+++ R(+) !tv b++ DI D G+ e++ h! y- -END GEEK CODE BLOCK-
Re: Filename literals
On 2009-Aug-17, at 8:36 am, Jon Lang wrote: Timothy S. Nelson wrote: Well, my main thought in this context is that the stuff that can be done to the inside of a file can also be done to other streams -- TCP sockets for example (I know, there are differences, but the two are a lot the same), whereas metadata makes less sense in the context of TCP sockets; But any IO object might have metadata; some different from the metadata you traditionally get with files, and some the same, e.g. $io.size, $io.times{modified}, $io.charset, $io.type. if (path{/path/to/file}.e) { @lines = slurp(path{/path/to/file}); } (I'm using one of David's suggested syntaxes above, but I'm not closely attached to it). I suggested variations along the line of: io /path/to/file. It amounts to much the same thing, but it's important conceptually to distinguish a pathname from the thing it names. (A path doesn't have a modification date, a file does.) Also, special quoting/escaping could apply to other things, not limited to filenames. That said, I don't think it's unreasonable to want to combine both operations for brevity, but the io-constructor should have built-in path parsing, not the other way around. I guess what I'm saying here is that I think we can do the things without people having to worry about the objects being separate unless they care. So, separate objects, but hide it as much as possible. Is that something you're fine with? Yes -- to me that means some class/role that wraps up all the pieces together, but all the separate components are still there underneath. But I'm not too bothered about how it's implemented as long as it's transparent for casual use. my $file = io p[/some/file]; my $contents = $file.data; my $mod-date = $file.times{modified}; my $size = $file.size; Pathnames still are strings, so that's fine. In fact, there are different As for pathnames being strings, you may be right FSVO string. But I'd say that, while they may be strings, they're not Str, but they do Str Agreed, pathnames are almost strings, but worth distinguishing conceptually. There should be a URL type that does Str. Actually, there are other differences, like case-insensitivity and illegal chars. Unfortunately, those depend on the given filesystem. As long as you're dealing with one FS at a time, that's OK; it probably means we have IO::Name::ext3, IO::Name::NTFS, IO::Name::HFS, etc. But what happens when you cross FS-barriers? Does a case- sensitive name match a case-insensitive one? Is filename-equality not commutative or not transitive? If you're looking for a filename foo on Mac/Win, then a file actually called FOO matches; but on Unix it wouldn't. (Actually, Macs can do both IO::Name::HFS::case-insensitive and IO::Name::HFS::case-sensitive. Eek.) I'd like Perl 6's treatment of filenames to be smart enough that smart-matching any of these pairs of alternative spellings would result in a successful match. So while I'll agree that filenames are string-like, I really don't want them to _be_ strings. Well, the *files* are the same, but the pathnames are different. I'm not sure whether some differences in spelling should be ignored by default or not. There are actually several different kinds; S32 has a method realpath, but I think canonical is a better name, because aliases can be just as real as the canonical path, e.g. a web page with multiple addresses. Or hard links rather than soft links -- though in that case, there is no one canonical path. It may not even be possible to easily tell if there is one or not. Some ways in which different paths can be considered equivalent: Spelling: C:\PROGRA~1, case-insensitivity Simplification: foo/../bar/ to bar/ Resolution: of symlinks/shortcuts Content-wise: hard links/multiple addresses Depending on the circumstances, you might want any of those to count as the same file; or none of them. We'll need methods for each sort of transformation, $path.canonical, $path.normalize, $path.simplify, etc. Two high-level IO objects are the same, regardless of path, if $file2 =:= $file2 (which might compare inodes, etc.). There should be a way to set what level of sameness applies in a given lexical scope; perhaps the first two listed above are a reasonable default to start with. There's something that slightly jars me here... I don't like the quotation returning an IO object. But doesn't normal quoting return a Str object? And regex quoting return an object (Regex? Match? Something, anyway). Certainly, but a regex doesn't produce a Signature object, say. I don't object to objects, just to creating objects, then doing something with them, then returning another kind of object, and calling that parsing. If we're parsing the characters, we should end up with an IO::Name. If
Re: Filename literals
Reading this discussion, I'm getting the feeling that filename literals are increasingly getting magical, something that I don't think is a good development. The only sane way to deal with filenames is treating them as opaque binary strings, making any more assumptions is bound to get you into trouble. I don't want to deal with Windows' strange restrictions on characters when I'm working on Linux. I don't want to deal with any other platform's particularities either. Portability should be positive, not negative IMNSHO. As for comparing paths: reimplementing logic that belongs to the filesystem sounds like really Bad Idea™ to me. Two paths can't be reliably compared without choosing to make some explicit assumptions, and I don't think Perl should make such choices for the programmer. Leon Timmermans
Re: Filename literals
On Tue, 18 Aug 2009, David Green wrote: On 2009-Aug-17, at 8:36 am, Jon Lang wrote: Timothy S. Nelson wrote: Well, my main thought in this context is that the stuff that can be done to the inside of a file can also be done to other streams -- TCP sockets for example (I know, there are differences, but the two are a lot the same), whereas metadata makes less sense in the context of TCP sockets; But any IO object might have metadata; some different from the metadata you traditionally get with files, and some the same, e.g. $io.size, $io.times{modified}, $io.charset, $io.type. Ok, now you're giving me ideas :). [snipped a bit and moved it further down the e-mail] I guess what I'm saying here is that I think we can do the things without people having to worry about the objects being separate unless they care. So, separate objects, but hide it as much as possible. Is that something you're fine with? Yes -- to me that means some class/role that wraps up all the pieces together, but all the separate components are still there underneath. But I'm not too bothered about how it's implemented as long as it's transparent for casual use. my $file = io p[/some/file]; my $contents = $file.data; my $mod-date = $file.times{modified}; my $size = $file.size; That sounds like the kind of thing I'm heading for. Pathnames still are strings, so that's fine. In fact, there are different As for pathnames being strings, you may be right FSVO string. But I'd say that, while they may be strings, they're not Str, but they do Str Agreed, pathnames are almost strings, but worth distinguishing conceptually. There should be a URL type that does Str. Actually, there are other differences, like case-insensitivity and illegal chars. Unfortunately, those depend on the given filesystem. As long as you're dealing with one FS at a time, that's OK; it probably means we have IO::Name::ext3, IO::Name::NTFS, IO::Name::HFS, etc. But what happens when you cross FS-barriers? Does a case-sensitive name match a case-insensitive one? Is filename-equality not commutative or not transitive? If you're looking for a filename foo on Mac/Win, then a file actually called FOO matches; but on Unix it wouldn't. (Actually, Macs can do both IO::Name::HFS::case-insensitive and IO::Name::HFS::case-sensitive. Eek.) I think it should depend on the set of constraints involved. I'd like Perl 6's treatment of filenames to be smart enough that smart-matching any of these pairs of alternative spellings would result in a successful match. So while I'll agree that filenames are string-like, I really don't want them to _be_ strings. Well, the *files* are the same, but the pathnames are different. I'm not sure whether some differences in spelling should be ignored by default or not. There are actually several different kinds; S32 has a method realpath, but I think canonical is a better name, because aliases can be just as real as the canonical path, e.g. a web page with multiple addresses. Or hard links rather than soft links -- though in that case, there is no one canonical path. It may not even be possible to easily tell if there is one or not. Some ways in which different paths can be considered equivalent: Spelling: C:\PROGRA~1, case-insensitivity Simplification: foo/../bar/ to bar/ Resolution: of symlinks/shortcuts Content-wise: hard links/multiple addresses Depending on the circumstances, you might want any of those to count as the same file; or none of them. We'll need methods for each sort of transformation, $path.canonical, $path.normalize, $path.simplify, etc. Two high-level IO objects are the same, regardless of path, if $file2 =:= $file2 (which might compare inodes, etc.). There should be a way to set what level of sameness applies in a given lexical scope; perhaps the first two listed above are a reasonable default to start with. Ok, my next commit will have canonpath (stolen directly from p5's File::Spec documentation), which will do No physical check on the filesystem, but a logical cleanup of a path, and realpath (idea taken from p5's Cwd documentation), which will resolve symlinks, etc, and provide an absolute path. Oh, and resolvepath, which does both. I'm not quite sure I followed all your discussion above -- have I left something out? Anyway, my assumption is that there should be a number of comparison options. Since we do Str, we should get string comparison for free. But I'm expecting other options at other levels, but have no idea how or what at this point. There's something that slightly jars me here... I don't like the quotation returning an IO object. But doesn't normal quoting return a Str object? And regex quoting return an object (Regex? Match? Something, anyway). Certainly, but a regex doesn't produce a Signature object, say. I don't object to objects, just to creating objects,
Re: Filename literals
Leon (): Reading this discussion, I'm getting the feeling that filename literals are increasingly getting magical, something that I don't think is a good development. The only sane way to deal with filenames is treating them as opaque binary strings, making any more assumptions is bound to get you into trouble. I don't want to deal with Windows' strange restrictions on characters when I'm working on Linux. I don't want to deal with any other platform's particularities either. Portability should be positive, not negative IMNSHO. As for comparing paths: reimplementing logic that belongs to the filesystem sounds like really Bad Idea™ to me. Two paths can't be reliably compared without choosing to make some explicit assumptions, and I don't think Perl should make such choices for the programmer. Very nicely put. We can't predict the future, but in creating something that'll at least persist through the next decade, let's not do elaborate things with lots of moving parts. Let's make a solid ground to stand on; something so stable that it works uphill and underwater. People with expertise and tuits will write the facilitating modules. PerlJam To quote Kernighan and Pike: Simplicity. Clarity. Generality. moritz_ I agree. Matt-W magic can always be added with module goodness // Carl
Re: Filename literals
On Tue, Aug 18, 2009 at 3:20 PM, Carl Mäsak cma...@gmail.com wrote: Let's make a solid ground to stand on; something so stable that it works uphill and underwater. People with expertise and tuits will write the facilitating modules. PerlJam To quote Kernighan and Pike: Simplicity. Clarity. Generality. moritz_ I agree. Matt-W magic can always be added with module goodness I agree with this principle. The discussion has been (and probably still will be) fruitful anyway, if only in illuminating the challenges with multi-platform and multi-filesystem support, some of the things we need to consider for that and how. -- Jan
Re: Filename literals
+1 Carl Mäsak wrote: Very nicely put. We can't predict the future, but in creating something that'll at least persist through the next decade, let's not do elaborate things with lots of moving parts. Let's make a solid ground to stand on; something so stable that it works uphill and underwater. People with expertise and tuits will write the facilitating modules. PerlJam To quote Kernighan and Pike: Simplicity. Clarity. Generality. moritz_ I agree. Matt-W magic can always be added with module goodness
Re: Filename literals
On Tue, 18 Aug 2009, Leon Timmermans wrote: Reading this discussion, I'm getting the feeling that filename literals are increasingly getting magical, something that I don't think is a good development. The only sane way to deal with filenames is treating them as opaque binary strings, making any more assumptions is bound to get you into trouble. I don't want to deal with Windows' strange restrictions on characters when I'm working on Linux. I don't want to deal with any other platform's particularities either. Portability should be positive, not negative IMNSHO. Sounds to me like you need p:bin{/path/to/file} -- that does what you want it to. I'll make it more obvious in the S16 documentation. As for comparing paths: reimplementing logic that belongs to the filesystem sounds like really Bad Idea? to me. Two paths can't be reliably compared without choosing to make some explicit assumptions, and I don't think Perl should make such choices for the programmer. That's why I want multiple comparison options, so that people have to explicitly choose what they want. How to do this, though, I'm unsure. :) - | Name: Tim Nelson | Because the Creator is,| | E-mail: wayl...@wayland.id.au| I am | - BEGIN GEEK CODE BLOCK Version 3.12 GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- PE(+) Y+++ PGP-+++ R(+) !tv b++ DI D G+ e++ h! y- -END GEEK CODE BLOCK-
Re: Filename literals
On Tue, Aug 18, 2009 at 15:20, Carl Mäsakcma...@gmail.com wrote: Leon (): Reading this discussion, I'm getting the feeling that filename literals are increasingly getting magical, something that I don't think is a good development. The only sane way to deal with filenames is treating them as opaque binary strings, making any more assumptions is bound to get you into trouble. I don't want to deal with Windows' strange restrictions on characters when I'm working on Linux. I don't want to deal with any other platform's particularities either. Portability should be positive, not negative IMNSHO. The whole reason filenames/paths is a mess to code if because they are treated as binary strings in most cases. This is also why we have modules like File::Spec and bunch more on CPAN all trying to do the same thing. And today if I want to code something that works on all platforms I have to use that instead. How can this be positive? For me a Path literal is a way to get rid of all this bandage so we don't have to bother with the strange restrictions later when we get a bug report from a CPAN user. And there is nothing magical about it, no more so than if I ask for the length of UTF8 string I expect get back the number of characters not the number of bytes. A path is a well defined size on all platforms and should be treated as such. The main problems is that POSIX really never did cover this part too well. But today we have Unicode and UTF8 and as such this is the de facto default on most modern unix'es as most libraries and tools will write filenames in this format if so defined in the locale. Just writing binary data to a filename is bound to get you into trouble and you will quickly find that many of the common C libraries will fail if locale and filename does not match. So even on Linux/Unix a path really not just any number of bytes with / as delimiter. It depends on the locale and the encoding set for the file system and not caring about that will get you into trouble. But than again you always have the option of using p:unix{}, it's also a clear way to signal you really don't care about portability and that this will only work on Unix. Or you could even use Q{} as this pretty much will allow you to anything. As for comparing paths: reimplementing logic that belongs to the filesystem sounds like really Bad Idea™ to me. Two paths can't be reliably compared without choosing to make some explicit assumptions, and I don't think Perl should make such choices for the programmer. Getting any kind of path's from user input will require you to reimplement that logic if you care about validate data before throwing it at the file system. If you buy that paths are well defined types, then comparing paths should not require making any assumptions. We can compare Unicode string without making assumptions. Very nicely put. We can't predict the future, but in creating something that'll at least persist through the next decade, let's not do elaborate things with lots of moving parts. Let's make a solid ground to stand on; something so stable that it works uphill and underwater. People with expertise and tuits will write the facilitating modules. PerlJam To quote Kernighan and Pike: Simplicity. Clarity. Generality. moritz_ I agree. Matt-W magic can always be added with module goodness I completely agree we can't predict the future but we do have to make some sane choices about how the default should work, who knows if UTF8 will still be hot new thing in 10 years, but that's still the default assumption for much of Perl 6 if nothing else is known about the input we get. And I totally agree path literals should not be magically, they should be well defined and you should not suffer when using them because platform X or Y has strange restrictions. But when finding the sane default we have to make restrictions and POSIX's path is binary data, simply is to lax. My idea about using the lowest common denominator for modern Unix and windows was that we could get as much of Unicode in path names as possible without breaking on modern platforms and as a way to get Simplicity, Clarity and Generality into paths. Because this will never be simple, clear or general: File::Spec-catfile(qw(.. ext Sys Syslog macros.all)); or any of the other example that we can find: http://www.google.com/codesearch?hl=enstart=10sa=Nq=FIle::Spec-%3Ecatfile Regards Troels
Re: Filename literals
Leon Timmermans wrote: Reading this discussion, I'm getting the feeling that filename literals are increasingly getting magical, something that I don't think is a good development. [...]. I don't want to deal with Windows' strange restrictions on characters when I'm working on Linux. I don't want to deal with any other platform's particularities either. I'd like to agree, and also suggest that the use-case for filename literals probably favors the native approach. Most applications should not hard-code filename constants: they should use config files, ask users, or programatically construct them from other information. OTOH, one-liners and throw-away scripts will frequently hard code these things. So the filename literal syntax should optimize for this usage: a 90% solution that makes easy things trivial (but requires hard things to use an IO CTOR) seems to me to be what I'd want for a one-liner.
Re: Filename literals
Timothy S. Nelson wrote: David Green wrote: Jon Lang wrote: If so, could you give some examples of how such a distinction could be beneficial, or of how the lack of such a distinction is problematic? Well, my main thought in this context is that the stuff that can be done to the inside of a file can also be done to other streams -- TCP sockets for example (I know, there are differences, but the two are a lot the same), whereas metadata makes less sense in the context of TCP sockets; I guess this was one of the thoughts that led me to want separate things here. Ah. I can see that. Well, I definitely think there needs to be a class that combines the inside and the outside, or the data and the metadata. Certainly the separate parts will exist separately for purposes of implementation, but there needs to be a user-friendlier view wrapped around that. Or maybe there are (sort of) three levels, low, medium, and high; that is, the basic implementation level (=P6 direct access to OS- and FS- system calls); the combined level, where an IO or File object encompasses IO::FSnode and IO::FSdata, etc.; and a gloss-over-the-details level with lots of sugar on top (at the expense of losing control over some details). Hmm. With the quoting idea, I don't see the need for a both type of object. I mean, I'd see the code happening something like this: if (path{/path/to/file}.e) { �...@lines = slurp(path{/path/to/file}); } Or... if (path{/path/to/file}.e) { $handle = open(path{/path/to/file}); } (I'm using one of David's suggested syntaxes above, but I'm not closely attached to it). For the record, the above syntax was my suggestion. I guess what I'm saying here is that I think we can do the things without people having to worry about the objects being separate unless they care. So, separate objects, but hide it as much as possible. Is that something you're fine with? It looks good to me. In fact, having q, Q, or qq involved at all strikes me as wrong, since those three are specifically for generating strings. Pathnames still are strings, so that's fine. In fact, there are different Hmm. I'm not so sure; maybe I'm just being picky, but I want to clarify things in case it's important (in other words, I'm thinking out loud here to see if it helps). First, Q and friends don't generate strings, they generate string-like objects, which could be Str, or Match, or whatever. Think of quoting constructs as a way of temporarily switching to a different sublanguage (cf. regex), and you'll have the idea that I have in mind. As for pathnames being strings, you may be right FSVO string. But I'd say that, while they may be strings, they're not Str, but they do Str, as in role IO::FSNode does Str {...} (FSNode may not be the right name here, but is used for illustrative purposes). I'd go one step further. Consider the Windows path 'C:\Program Files\'. Is the string what's really important, or is it the directory to which the string refers? I ask because, for legacy reasons, the following points to the same directory: 'C:\PROGRA~1\'. Then there's the matter of absolute and relative paths: if the current working directory is 'C:\Program Files\', then the path 'thisfile' actually refers to 'C:\Program Files\thisfile'. And because of parent directory and self-reference links, things like '/bin/../etc/.' is just an overcomplicated way of pointing to '/etc'. I'd like Perl 6's treatment of filenames to be smart enough that smart-matching any of these pairs of alternative spellings would result in a successful match. So while I'll agree that filenames are string-like, I really don't want them to _be_ strings. things going on here; one is to have a way of conveniently quoting strings that contain a lot of backslashes. Just as Perl lets you pick different quotation marks, to make it easier to quote strings that have a lot of or ' characters, so it should have a way to make it easy to quote strings with a lot of backslashes. (The most obvious example being Windows paths; but there are other possibilities, such as needing to eval some code that already has a lot of backslashes in it.) Now, you can already turn backwhacking on or off via Q's :backslash adverb; Q:qq includes :b (and Q:q recognises a few limited escape sequences like \\). So you could say Q[C:\some\path], and you could add scalar interpolation to say Q:s[C:\some\path\$filename]. But there's no way to have all of: literal backslashes + interpolation + escaped sigils. Perhaps instead of a simple :b toggle, we could have an :escapeStr adverb that defaults to :escape\? Then you could have Q:scalar:escape(^)[C:\path\with\literal^$\$filename]. Maybe a global variable? It's an interesting idea, and I'll see how others feel :). I'm leery of global variables, per se; but I _do_ like the idea of
Re: Filename literals
Hey, Just joined the list, and I too have been thinking about a good path literal for Perl 6. Nice to see so many other people are thinking the same :). Not knowing where to start in this long thread, I will instead try to show how I would like a path literal to work. For me a path literal is a way to make the code pretty and clean. And for multi platform coding this is mostly where it gets hard to do. So I think a path literal should make it possible to use both a native style and a more modern portable one, without having to give up using spaces like in Path::Spec from Perl 5 or have to do verbose object creation. First I think extending Q with a Q:path{} and making the alias Q:p{} and p{} would be the most consistent with the current string literal API. Also it should be possible to sub type the literals to further limit format and content. This should be done so we can get compile time error when path's are know to be incorrect or that we throw an exception or return a undef with an error type(or whatever Larry called it) when we interpolate and return something that is known to be incorrect. The default p{} should only allow / as separator and should not allow characters that won't work on modern Windows and Unix like \ / ? % * : | , etc. The reason for this is that portable Path's should be the default and if you really need platform specific behavior it should be shown in the code. my Path $path = p{../ext/dictonary.txt}; or my Path $path = p{c:/ext/dictonary.txt}; We should allow windows style paths so converting and maintaining code on this platform is not a pain. my Path $path = p:win{C:\Program Files\MS Access\file.file}; For Unix specific behavior we should have a p:unix{} literal, here the only limit are what is defined by locale. So we won't be able to write full Unicode if locale is set to Latin1. Writing filenames to the filesystem that other programs won't be able to read should be hard. my Path $path = p:unix{/usr/src/bla/myfile?:%.file}; And for people where this is a problem p:bin{} can be used as no checking is done here. my $path = p:bin{/usr/src/bla/??/adasd/myfile}; Old style Mac paths could also be supported where the : is used as separator. my Path $path = p:mac{usr:src:bla}; Or old dos paths where 8 char limits and all the old dos stuff apply. my Path $path = p:dos{c:\windows\test.fil}; Urls could also be support with: my Path $path = p:url{file:///home/test.file} ** Path Object like File::Spec, etc. just nicer ** All the different variants for p{} return a Path object that offers much of what is found in File::Spec, Cwd and Path::Class in Perl 5 today in a more Perl 6 way. my Path $real_path = $path.realpath; # Like Cwd's realpath my Path $volume = $path.volume; # Returns the volume part if relevant my Path $dir = $path.dir; # Returns the directory part my Path $file = $path.file; # Returns the file part $path.shift(); # Get rid of last part of path $path.pop(); # Get rid of first part or path my @paths = $path.dirs; # Returns the directory parts of the path etc. ** Comparing Paths should do the right thing ** As we have the option of specifying what type a Path object is, this should also count when comparing the them. So fx. p:win{} are case insensitive. my $file = p:win{c:\My File.txt}; my $path = p:win{C:\Program Files\..}; if($path.is_in($file)) { # Check if the path is contained in another path say $file is in $path\n; # C:\My File.txt is C: } if(p{../test} ~~ p{../dir/../test}) { say Comparing two Path works as it should; } Also Path handles Unicode normalization so this won't be a problem: http://lists.zerezo.com/git/msg643117.html Meaning that both MA WITH UMLAUTrchen and MaUMLAUT MODIFIERrchen are the same path, but without normalizing the path behind the users back. ** Utility functions ** Path in itself knows nothing about the filesystem and files but might have a peek in $*CWD to do some path logic. Except for that a number of File related functions might be available to make it easy to open and slurp a file a Path points to. my File $file = p{/etc/passwd}.open; if($file.type ~~ 'text/plain') { say looks like a password file; } my @passwd = p{/etc/passwd}.lines; if(p{/etc/passwd}.exists) { say passwd file exists; } This is my thought so far, hope it helps the discussion. Regards Troels
Re: Filename literals
Troels Liebe Bentsen wrote: Hey, Just joined the list, and I too have been thinking about a good path literal for Perl 6. Nice to see so many other people are thinking the same :). Welcome to the list! Not knowing where to start in this long thread, I will instead try to show how I would like a path literal to work. A well-considered proposal, and one with which I mostly agree. Some thoughts: The default p{} should only allow / as separator and should not allow characters that won't work on modern Windows and Unix like \ / ? % * : | , etc. The reason for this is that portable Path's should be the default and if you really need platform specific behavior it should be shown in the code. I note that you explicitly included * and ? in the list of forbidden characters; I take it, then, that you're not in favor of Path as a glob-based pattern-matching utility? E.g.: my Path $path; ... unless $path ~~ pastro* { say the file doesn't begin with 'astro'. } Admittedly, this particular example _could_ be accomplished through the use of a regex; but there _are_ cases where the use of wildcard characters would be easier than the series of equivalent tests that Perl would otherwise have to perform in order to achieve the same result. Hmm... maybe we need something analogous to q vs. qq; that is: pastro* #`{ syntax error: '*' is not a valid filename character. } ppastro* #`{ returns an object that is used for Path pattern-matching; perhaps Pathglob or somesuch? } We should allow windows style paths so converting and maintaining code on this platform is not a pain. : For Unix specific behavior we should have a p:unix{} literal, here the only limit are what is defined by locale. : And for people where this is a problem p:bin{} can be used as no checking is done here. : Old style Mac paths could also be supported where the : is used as separator. : Or old dos paths where 8 char limits and all the old dos stuff apply. Hear, hear. Note that these are all mutually exclusive, which suggests that the proper format ought to be something like: my Path $path = p:formatwin{C:\Program Files} However, I have no problem with the idea that :win is short for :formatwin; the feature here is brevity. Urls could also be support with: my Path $path = p:url{file:///home/test.file} I would be very careful here, in that I wouldn't want to open the can of worms inherent in non-file protocols (e.g., ftp, http, gopher, mail), or even in file protocols with hosts other than localhost. ** Path Object like File::Spec, etc. just nicer ** : ** Comparing Paths should do the right thing ** Agreed on all counts. ** Utility functions ** Path in itself knows nothing about the filesystem and files but might have a peek in $*CWD to do some path logic. Except for that a number of File related functions might be available to make it easy to open and slurp a file a Path points to. my File $file = p{/etc/passwd}.open; if($file.type ~~ 'text/plain') { say looks like a password file; } my @passwd = p{/etc/passwd}.lines; if(p{/etc/passwd}.exists) { say passwd file exists; } As soon as you allow methods such as .exists, it undermines your claim that Path knows nothing about the filesystem or files. IMHO, you should still include such methods. -- Jonathan Dataweaver Lang
Re: Filename literals
On Mon, 17 Aug 2009, Jon Lang wrote: Well, I definitely think there needs to be a class that combines the inside and the outside, or the data and the metadata. Certainly the separate parts will exist separately for purposes of implementation, but there needs to be a user-friendlier view wrapped around that. Or maybe there are (sort of) three levels, low, medium, and high; that is, the basic implementation level (=P6 direct access to OS- and FS- system calls); the combined level, where an IO or File object encompasses IO::FSnode and IO::FSdata, etc.; and a gloss-over-the-details level with lots of sugar on top (at the expense of losing control over some details). Hmm. With the quoting idea, I don't see the need for a both type of object. I mean, I'd see the code happening something like this: if (path{/path/to/file}.e) { �...@lines = slurp(path{/path/to/file}); } Or... if (path{/path/to/file}.e) { $handle = open(path{/path/to/file}); } (I'm using one of David's suggested syntaxes above, but I'm not closely attached to it). For the record, the above syntax was my suggestion. Ok, as long as I don't have to take the blame :). Seriously, I was confused by trying to reply to two e-mails at once. Sorry. In fact, having q, Q, or qq involved at all strikes me as wrong, since those three are specifically for generating strings. Pathnames still are strings, so that's fine. In fact, there are different Hmm. I'm not so sure; maybe I'm just being picky, but I want to clarify things in case it's important (in other words, I'm thinking out loud here to see if it helps). First, Q and friends don't generate strings, they generate string-like objects, which could be Str, or Match, or whatever. Think of quoting constructs as a way of temporarily switching to a different sublanguage (cf. regex), and you'll have the idea that I have in mind. As for pathnames being strings, you may be right FSVO string. But I'd say that, while they may be strings, they're not Str, but they do Str, as in role IO::FSNode does Str {...} (FSNode may not be the right name here, but is used for illustrative purposes). I'd go one step further. Consider the Windows path 'C:\Program Files\'. Is the string what's really important, or is it the directory to which the string refers? I ask because, for legacy reasons, the following points to the same directory: 'C:\PROGRA~1\'. Then there's the matter of absolute and relative paths: if the current working directory is 'C:\Program Files\', then the path 'thisfile' actually refers to 'C:\Program Files\thisfile'. And because of parent directory and self-reference links, things like '/bin/../etc/.' is just an overcomplicated way of pointing to '/etc'. I'd like Perl 6's treatment of filenames to be smart enough that smart-matching any of these pairs of alternative spellings would result in a successful match. So while I'll agree that filenames are string-like, I really don't want them to _be_ strings. Good ideas. But I still want it to have the same interface, so I can concatenate them easily in error messages :). things going on here; one is to have a way of conveniently quoting strings that contain a lot of backslashes. Just as Perl lets you pick different quotation marks, to make it easier to quote strings that have a lot of or ' characters, so it should have a way to make it easy to quote strings with a lot of backslashes. (The most obvious example being Windows paths; but there are other possibilities, such as needing to eval some code that already has a lot of backslashes in it.) Now, you can already turn backwhacking on or off via Q's :backslash adverb; Q:qq includes :b (and Q:q recognises a few limited escape sequences like \\). So you could say Q[C:\some\path], and you could add scalar interpolation to say Q:s[C:\some\path\$filename]. But there's no way to have all of: literal backslashes + interpolation + escaped sigils. Perhaps instead of a simple :b toggle, we could have an :escapeStr adverb that defaults to :escape\? Then you could have Q:scalar:escape(^)[C:\path\with\literal^$\$filename]. Maybe a global variable? It's an interesting idea, and I'll see how others feel :). I'm leery of global variables, per se; but I _do_ like the idea of lexically-scoped options that let you customize the filename syntax. Changing the default delimiter would be the most common example of this. Yeah, global variable is probably a bad idea. But it *feels* like it should be some kind of global or semi-global setting :). By semi-global, I mean something that you can override in your local scope, and have it revert, much as with the $*IN, etc, filehandles. Now, isn't Q:path[/some/file] just creating an IO object? Unlike /foo/, where foo just IS the pattern, /some/file is *not* an IO object, it's just a filename. So if the special path-quoting returned an
Re: Filename literals
On Sun, 16 Aug 2009, David Green wrote: On 2009-Aug-15, at 9:22 am, Jon Lang wrote: IOW, your outside the file stuff is whatever can be done without having to open the file, and your inside the file is whatever only makes sense once the file has been opened. Correct? Pretty much, yes. If so, could you give some examples of how such a distinction could be beneficial, or of how the lack of such a distinction is problematic? Well, my main thought in this context is that the stuff that can be done to the inside of a file can also be done to other streams -- TCP sockets for example (I know, there are differences, but the two are a lot the same), whereas metadata makes less sense in the context of TCP sockets; I guess this was one of the thoughts that led me to want separate things here. Well, I definitely think there needs to be a class that combines the inside and the outside, or the data and the metadata. Certainly the separate parts will exist separately for purposes of implementation, but there needs to be a user-friendlier view wrapped around that. Or maybe there are (sort of) three levels, low, medium, and high; that is, the basic implementation level (=P6 direct access to OS- and FS- system calls); the combined level, where an IO or File object encompasses IO::FSnode and IO::FSdata, etc.; and a gloss-over-the-details level with lots of sugar on top (at the expense of losing control over some details). Hmm. With the quoting idea, I don't see the need for a both type of object. I mean, I'd see the code happening something like this: if (path{/path/to/file}.e) { @lines = slurp(path{/path/to/file}); } Or... if (path{/path/to/file}.e) { $handle = open(path{/path/to/file}); } (I'm using one of David's suggested syntaxes above, but I'm not closely attached to it). I guess what I'm saying here is that I think we can do the things without people having to worry about the objects being separate unless they care. So, separate objects, but hide it as much as possible. Is that something you're fine with? In fact, having q, Q, or qq involved at all strikes me as wrong, since those three are specifically for generating strings. Pathnames still are strings, so that's fine. In fact, there are different Hmm. I'm not so sure; maybe I'm just being picky, but I want to clarify things in case it's important (in other words, I'm thinking out loud here to see if it helps). First, Q and friends don't generate strings, they generate string-like objects, which could be Str, or Match, or whatever. Think of quoting constructs as a way of temporarily switching to a different sublanguage (cf. regex), and you'll have the idea that I have in mind. As for pathnames being strings, you may be right FSVO string. But I'd say that, while they may be strings, they're not Str, but they do Str, as in roleIO::FSNode does Str {...} (FSNode may not be the right name here, but is used for illustrative purposes). things going on here; one is to have a way of conveniently quoting strings that contain a lot of backslashes. Just as Perl lets you pick different quotation marks, to make it easier to quote strings that have a lot of or ' characters, so it should have a way to make it easy to quote strings with a lot of backslashes. (The most obvious example being Windows paths; but there are other possibilities, such as needing to eval some code that already has a lot of backslashes in it.) Now, you can already turn backwhacking on or off via Q's :backslash adverb; Q:qq includes :b (and Q:q recognises a few limited escape sequences like \\). So you could say Q[C:\some\path], and you could add scalar interpolation to say Q:s[C:\some\path\$filename]. But there's no way to have all of: literal backslashes + interpolation + escaped sigils. Perhaps instead of a simple :b toggle, we could have an :escapeStr adverb that defaults to :escape\? Then you could have Q:scalar:escape(^)[C:\path\with\literal^$\$filename]. Maybe a global variable? It's an interesting idea, and I'll see how others feel :). The ultimate in path literals would be to establish a similar default delimiter. [...] `path`.size # how big is the file? Returns number. There's something that slightly jars me here... I don't like the quotation returning an IO object. (I like the conciseness, but there's something a bit off conceptually.) Hmm. But doesn't normal quoting return a Str object? And regex quoting return an object (Regex? Match? Something, anyway). Now, isn't Q:path[/some/file] just creating an IO object? Unlike /foo/, where foo just IS the pattern, /some/file is *not* an IO object, it's just a filename. So if the special path-quoting returned an IO::File::Name object, I would be perfectly happy. But you can't have $filename.size -- a fileNAME doesn't have a size, the file itself does. To get from the filename to the
Re: Filename literals
On Fri, 14 Aug 2009, Darren Duncan wrote: Richard Hainsworth wrote: Would it be possible to remove the special purpose of \ from strings within IO constructs? This would mean '\' could be used in naming paths as an alternative to '/', thus allowing windows and unix strings to be equivalent, eg. IO(:path{$root-path}/data/new) would be equivalent to IO(:path{$root-path}\data\new) The usefulness would be most evident for sub-directories as windows and unix have different ways of describing root, viz. 'C:\' versus '/' I see problems with this considering that \ is quite universally recognized in Perl (and many other languages) as meaning an escape character, and that moreover you generally need to be able to escape characters in any context building a string. Considering, though, that we're talking about a magic perl quoting syntax, we could offer people the option of the following two: q:io{C:\Windows} # Does what you want q:io:qq:{C:\\Windows} # Does the same thing Wouldn't that cover the bases pretty well? :) - | Name: Tim Nelson | Because the Creator is,| | E-mail: wayl...@wayland.id.au| I am | - BEGIN GEEK CODE BLOCK Version 3.12 GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- PE(+) Y+++ PGP-+++ R(+) !tv b++ DI D G+ e++ h! y- -END GEEK CODE BLOCK-
Re: Filename literals
On Sat, 15 Aug 2009, Timothy S. Nelson wrote: Considering, though, that we're talking about a magic perl quoting syntax, we could offer people the option of the following two: q:io{C:\Windows} # Does what you want q:io:qq:{C:\\Windows} # Does the same thing Wouldn't that cover the bases pretty well? My bad -- try these: $file = foo Q:io{C:\Windows\$file} # Results in C:\Windows\$file q:io{C:\\Windows\\$file} # Results in the same thing qq:io{C:\\Windows\\$file} # Results in C:\Windows\foo HTH, - | Name: Tim Nelson | Because the Creator is,| | E-mail: wayl...@wayland.id.au| I am | - BEGIN GEEK CODE BLOCK Version 3.12 GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- PE(+) Y+++ PGP-+++ R(+) !tv b++ DI D G+ e++ h! y- -END GEEK CODE BLOCK-
Re: Filename literals
This whole thread seems oriented around two points: 1. Strings should not carry the burden of umpty-ump filesystem checking methods. 2. It should be possible to specify a filesystem entity using something nearly indistinguishable from standard string syntax. I agree with the first, but the relentless pursuit of the second seems to have gone beyond the point of useful speculation. What's wrong with File('C:\Windows') or Path() or Dir() or SpecialDevice()? Not to get all Cozens-y or anything, but chasing after ways to jam some cute string-like overloading into the syntax so that we can pull out the other overloading (which at least had the virtue of simplicity) seems pointless. The File::* functionality is probably going to be one of the very early p6 modules, and it is probably going to be in core. If that's true, why not allocate some really short names, ideally with 0 colons in them, and use them to spell out what's being done? Neither q:io:qq:{.} nor qq:io{.} really stand out at excellent ways to say this is a path, or directory, or file, or whatever. If it's plug-in-able, I'd take qq:file{.} or qq:dir{.} or qq:path{.}, but I'd rather see C File q{.} . =Austin Timothy S. Nelson wrote: On Sat, 15 Aug 2009, Timothy S. Nelson wrote: Considering, though, that we're talking about a magic perl quoting syntax, we could offer people the option of the following two: q:io{C:\Windows} # Does what you want q:io:qq:{C:\\Windows} # Does the same thing Wouldn't that cover the bases pretty well? My bad -- try these: $file = foo Q:io{C:\Windows\$file} # Results in C:\Windows\$file q:io{C:\\Windows\\$file} # Results in the same thing qq:io{C:\\Windows\\$file} # Results in C:\Windows\foo HTH, - | Name: Tim Nelson | Because the Creator is,| | E-mail: wayl...@wayland.id.au| I am | - BEGIN GEEK CODE BLOCK Version 3.12 GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- PE(+) Y+++ PGP-+++ R(+) !tv b++ DI D G+ e++ h! y- -END GEEK CODE BLOCK-
Re: Filename literals
On Sat, 15 Aug 2009, Austin Hastings wrote: This whole thread seems oriented around two points: 1. Strings should not carry the burden of umpty-ump filesystem checking methods. 2. It should be possible to specify a filesystem entity using something nearly indistinguishable from standard string syntax. I agree with the first, but the relentless pursuit of the second seems to have gone beyond the point of useful speculation. What's wrong with File('C:\Windows') or Path() or Dir() or SpecialDevice()? Not to get all Cozens-y or anything, but chasing after ways to jam some cute string-like overloading into the syntax so that we can pull out the other overloading (which at least had the virtue of simplicity) seems pointless. The File::* functionality is probably going to be one of the very early p6 modules, and it is probably going to be in core. If that's true, why not allocate some really short names, ideally with 0 colons in them, and use them to spell out what's being done? S32/IO already specifies all these things as living in the IO namespace. That could be changed, of course. Neither q:io:qq:{.} nor qq:io{.} really stand out at excellent ways to say q:io{.} would be the normal case, unless you want variable interpolation or the like. And it would be possible to come up with shorter versions (someone suggested qf). this is a path, or directory, or file, or whatever. If it's plug-in-able, I'd take qq:file{.} or qq:dir{.} or qq:path{.}, but I'd rather see C File q{.} . I'm not particularly attached to :io if we can think of something better. These things often have a short name and a long name. I'm against file because the IO::File object models what is inside the file (ie. open/read/write/close/etc), whereas the IO::FSNode/IO::FileNode/IO::DirectoryNode/IO::LinkNode objects model stuff on the outside of the file. It's things of this second type that I'm recommending that we return here. We could change the names of the objects of course, but I'm keen on keeping the class that does stuff to the inside of the file separate from the class that does stuff to the outside of the file. Path might be a good alternative in my mind. Anyway, back to the :io name. An alternative might be to have the short name be :p and the long name be :path. That would mean that we could do: q:p{.} That's a fair bit shorter than Path(q{.}). Hmm. Let's compare some code samples: if (q:p'/path/to/file' ~~ :r) { say Readable\n; } if (Path('/path/to/file') ~~ :r) { say Readable\n; } $fobj = new IO::File(FSNode = q:p'/path/to/file'); $fobj = new IO::File(FSNode = Path('/path/to/file')); I used single quotes for the Path() things because I think that's what people would probably do. Now, say we want to use backslashes. if (Q :p {C:\Windows\file} ~~ :r) { say Readable\n; } if (Path(Q {C:\Windows\file}) ~~ :r) { say Readable\n; } Ok, so they're comparable. I've used curlies here just because I thought it was a good idea :). Anyway, we have possibilities. Further thoughts anyone? :) - | Name: Tim Nelson | Because the Creator is,| | E-mail: wayl...@wayland.id.au| I am | - BEGIN GEEK CODE BLOCK Version 3.12 GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- PE(+) Y+++ PGP-+++ R(+) !tv b++ DI D G+ e++ h! y- -END GEEK CODE BLOCK-
Re: Filename literals
On Sat, Aug 15, 2009 at 7:17 AM, Timothy S. Nelsonwayl...@wayland.id.au wrote: On Sat, 15 Aug 2009, Austin Hastings wrote: This whole thread seems oriented around two points: 1. Strings should not carry the burden of umpty-ump filesystem checking methods. 2. It should be possible to specify a filesystem entity using something nearly indistinguishable from standard string syntax. I agree with the first, but the relentless pursuit of the second seems to have gone beyond the point of useful speculation. What's wrong with File('C:\Windows') or Path() or Dir() or SpecialDevice()? Not to get all Cozens-y or anything, but chasing after ways to jam some cute string-like overloading into the syntax so that we can pull out the other overloading (which at least had the virtue of simplicity) seems pointless. The File::* functionality is probably going to be one of the very early p6 modules, and it is probably going to be in core. If that's true, why not allocate some really short names, ideally with 0 colons in them, and use them to spell out what's being done? S32/IO already specifies all these things as living in the IO namespace. That could be changed, of course. Neither q:io:qq:{.} nor qq:io{.} really stand out at excellent ways to say q:io{.} would be the normal case, unless you want variable interpolation or the like. And it would be possible to come up with shorter versions (someone suggested qf). this is a path, or directory, or file, or whatever. If it's plug-in-able, I'd take qq:file{.} or qq:dir{.} or qq:path{.}, but I'd rather see C File q{.} . I'm not particularly attached to :io if we can think of something better. These things often have a short name and a long name. I'm against file because the IO::File object models what is inside the file (ie. open/read/write/close/etc), whereas the IO::FSNode/IO::FileNode/IO::DirectoryNode/IO::LinkNode objects model stuff on the outside of the file. It's things of this second type that I'm recommending that we return here. We could change the names of the objects of course, but I'm keen on keeping the class that does stuff to the inside of the file separate from the class that does stuff to the outside of the file. Path might be a good alternative in my mind. IOW, your outside the file stuff is whatever can be done without having to open the file, and your inside the file is whatever only makes sense once the file has been opened. Correct? If so, could you give some examples of how such a distinction could be beneficial, or of how the lack of such a distinction is problematic? Anyway, back to the :io name. An alternative might be to have the short name be :p and the long name be :path. That would mean that we could do: q:p{.} Isn't there something in the spec that indicates that qq is merely shorthand for q:qq? That is, it's possible to bundle a bunch of quote adverbs together under a special quote name. If so, you might say that q:path and q:p are longhand for path: path{.} # same as q:p{.} And yes, 'path' is longer that 'q:p' - but only by one character; and it's considerably more legible. As well, this is more in keeping with what's really going on here: path{.} would be no more a string than m{.} or rx{.} are. In fact, having q, Q, or qq involved at all strikes me as wrong, since those three are specifically for generating strings. Also note the following: string # same as qq[string] 'string' # same as q[string] /pattern/ # same as m[pattern]? The ultimate in path literals would be to establish a similar default delimiter. For example, what if the backtick were pressed into service for this purpose? (No, I'm not actually suggesting this; at the very least, there would be p5 false-compatibility issues involved. This is strictly illustrative.) `path` # same as path[path] `path`.e # does that filename exist? Returns boolean. `path`.size # how big is the file? Returns number. `path`.open # Returns new file handle. That's a fair bit shorter than Path(q{.}). Hmm. Let's compare some code samples: if (q:p'/path/to/file' ~~ :r) { say Readable\n; } if (Path('/path/to/file') ~~ :r) { say Readable\n; } if (path'/path/to/file'.r) { say Readable; } if (`/path/to/file`.r) { say Readable; } $fobj = new IO::File(FSNode = q:p'/path/to/file'); $fobj = new IO::File(FSNode = Path('/path/to/file')); $fobj = path[/path/to/file].open; $fobj = `/path/to/file`.open; I used single quotes for the Path() things because I think that's what people would probably do. Now, say we want to use backslashes. if (Q :p {C:\Windows\file} ~~ :r) { say Readable\n; } if (Path(Q {C:\Windows\file}) ~~ :r) { say Readable\n; } if (path:win[C:\Windows\file].r) { say Readable; } Anyway, we have possibilities. Further thoughts anyone? As illustrated above, I think
Re: Filename literals
More ideas: On Thu, 13 Aug 2009, Hinrik Örn Sigurðsson wrote: # bin/perl on Unix my $rel = qf/usr bin perl/; # /usr/bin/perl my $abs = qf[/usr bin perl]; ...and on Windows, would the above result in C:\/usr\bin\perl ? :) # The following both result in the same object (kinda): # /usr/bin/perl on Unix, C:\usr\bin\perl on Windows my $abs = qf:unix[/usr/bin/perl]; my $abs = qf:win[C:\usr\bin\perl]; Just thinking out loud. :) - | Name: Tim Nelson | Because the Creator is,| | E-mail: wayl...@wayland.id.au| I am | - BEGIN GEEK CODE BLOCK Version 3.12 GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- PE(+) Y+++ PGP-+++ R(+) !tv b++ DI D G+ e++ h! y- -END GEEK CODE BLOCK-
Re: Filename literals
I like this way. Would it be possible to remove the special purpose of \ from strings within IO constructs? This would mean '\' could be used in naming paths as an alternative to '/', thus allowing windows and unix strings to be equivalent, eg. IO(:path{$root-path}/data/new) would be equivalent to IO(:path{$root-path}\data\new) The usefulness would be most evident for sub-directories as windows and unix have different ways of describing root, viz. 'C:\' versus '/' David Green wrote: We should start thinking about the fundamental objects for doing IO as IO-objects. They *have* names, but they aren't names, or strings, or even filehandles (although they might *have* filehandles encapsulated inside to do the actual work). A filename is merely a way to get at the actual object, just as the string 2009/1/1 can be used to get a Date object. A string, or a handle, or an inode, or some unique filesystem spec number, or anything else you can get your hands on should be fed to a constructor: Of course, this being P6, we can have some kind of io macro that parses the single item after it: my $file1 = io file://some/dir/some%20file; # the quick way my $file2 = IO.new(:protocolfile :urifoo/bar/a file.html); # the verbose way
Re: Filename literals
On Thu, 13 Aug 2009, Hinrik Örn Sigurðsson wrote: Imagine two roles, Filename and Dirname (or Path::File / Path::Dir). I ...or imagine just one, called IO::FSNode. http://perlcabal.org/syn/S32/IO.html#IO::FSNode Btw, kudos for the special quoting idea -- I love it :). And in response to David Green and his comment about working with file data vs. metadata, as a systems programmer, I've written a fair number of programs that have worried a fair bit about the metadata in the filesystem; sometimes you want to read data, and sometimes metadata. That's why the Draft IO spec specifies two separate objects. HTH, - | Name: Tim Nelson | Because the Creator is,| | E-mail: wayl...@wayland.id.au| I am | - BEGIN GEEK CODE BLOCK Version 3.12 GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- PE(+) Y+++ PGP-+++ R(+) !tv b++ DI D G+ e++ h! y- -END GEEK CODE BLOCK-
Re: Filename literals
On Fri, 14 Aug 2009, Timothy S. Nelson wrote: On Thu, 13 Aug 2009, Hinrik Örn Sigurðsson wrote: Imagine two roles, Filename and Dirname (or Path::File / Path::Dir). I ...or imagine just one, called IO::FSNode. Sorry, I was stupiding again. I'll ask you to imagine 4: IO::FSNode | +-IO::FileNode | +-IO::DirectoryNode | +-IO::LinkNode Role composition tree depicted above. - | Name: Tim Nelson | Because the Creator is,| | E-mail: wayl...@wayland.id.au| I am | - BEGIN GEEK CODE BLOCK Version 3.12 GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- PE(+) Y+++ PGP-+++ R(+) !tv b++ DI D G+ e++ h! y- -END GEEK CODE BLOCK-
Re: Filename literals
On 2009-Aug-14, at 5:36 am, Richard Hainsworth wrote: Would it be possible to remove the special purpose of \ from strings within IO constructs? It's P6, anything's possible! I probably wouldn't change [what look like] ordinary quoted strings, but maybe something with a qf//-type construct (or would that be qf\\ ?!). My idea of using a macro was to grab almost anything that wasn't whitespace, and it doesn't have to be parsed like a normal string, so \ could be interpreted as a dir separator. Of course, / has become so standard, that it even works on Windows (kind of); on the other hand, being able to use either (or both) would be convenient for a lot of people. On 2009-Aug-14, at 7:18 am, Leon Timmermans wrote: I don't think that's a good idea. In general, parsing an URI isn't that easy, in particular determining the end is undefined AFAIK. In your example the semicolon should probably be considered part of the URI, even though that's obviously not what you intended. Well, we can encode a URI any way we like -- I was thinking of anything up to the next whitespace or semicolon, maybe allowing for balanced brackets; and internal semicolons, etc. being %-encoded. I guess the argument would be that using an encoding that looks almost- but-not-quite like other popular ways of representing URIs could be confusing, and people would be tempted to paste in addresses from their browser without re-encoding them the P6 way. Maybe it's more practical to permit only URIs with little to no punctuation to be unquoted, and quote anything else? Not that quoting is such a great hardship anyway On 2009-Aug-14, at 7:41 am, Timothy S. Nelson wrote: And in response to David Green and his comment about working with file data vs. metadata, as a systems programmer, I've written a fair number of programs that have worried a fair bit about the metadata in the filesystem; sometimes you want to read data, and sometimes metadata. Of course; and when I referred to low-level and high-level, there isn't really a distinct dividing line between the two. Is getting (or setting) the modification date on a file low-level because it's metadata, or high-level because it's a simple, ordinary task? I don't particularly care about the classification; I just wanted to make the point that P6 should make it possible to gloss over anything that's over-glossable. -David
Re: Filename literals
Richard Hainsworth wrote: Would it be possible to remove the special purpose of \ from strings within IO constructs? This would mean '\' could be used in naming paths as an alternative to '/', thus allowing windows and unix strings to be equivalent, eg. IO(:path{$root-path}/data/new) would be equivalent to IO(:path{$root-path}\data\new) The usefulness would be most evident for sub-directories as windows and unix have different ways of describing root, viz. 'C:\' versus '/' I see problems with this considering that \ is quite universally recognized in Perl (and many other languages) as meaning an escape character, and that moreover you generally need to be able to escape characters in any context building a string. Considering that, AFAIK, practically any modern file system, including those used by Windows like NTFS, are Unicode savvy and can have any character in a file name, if \ is used literally to denote itself, then what is a simple clean way to denote other characters that would otherwise be denoted with an escape sequence? I think it would be best, as well as preserving the principle of least surprise, if all of the same escaping syntaxes work universally across character-string-like contexts, which means that a literal \ means escaping. The best compromise that I see is that Windows filenames can be spelled out as Windows people are used to, except that / is used instead of \, so for example a Windows path begins with 'C:/' for example. Or even if the '/' paradigm for root is used in Windows, which may actually be best, the drive letter or drive name still needs to be in the path somewhere so that multiple drives can be distinguished, for example, 'C:\' becomes '/C/'. Under Mac OS X, all drives, root or otherwise, are accessible under '/Volumes/drive-name/...', and Unix in general lets you mount drives anywhere. I imagine Windows supports more ways of denoting drives than the drive letter, but either way I don't see a problem here. -- Darren Duncan
Re: Filename literals
On Fri, Aug 14, 2009 at 3:35 PM, Darren Duncandar...@darrenduncan.net wrote: Under Mac OS X, all drives, root or otherwise, are accessible under '/Volumes/drive-name/...', and Unix in general lets you mount drives anywhere. I imagine Windows supports more ways of denoting drives than the drive letter. Nope. Have to use the drive letter. But / is understood as a synonym for \ by the Windows API. -- Mark J. Reed markjr...@gmail.com
Re: Filename literals
On Aug 14, 2009, at 16:17 , Mark J. Reed wrote: On Fri, Aug 14, 2009 at 3:35 PM, Darren Duncandar...@darrenduncan.net wrote: Under Mac OS X, all drives, root or otherwise, are accessible under '/Volumes/drive-name/...', and Unix in general lets you mount drives anywhere. I imagine Windows supports more ways of denoting drives than the drive letter. Nope. Have to use the drive letter. But / is understood as a synonym for \ by the Windows API. UNC drive specs should work as well, i.e.. \\MYHOST\C\... (or swap / for \). -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu electrical and computer engineering, carnegie mellon universityKF8NH PGP.sig Description: This is a digitally signed message part
Re: Filename literals
On Fri, Aug 14, 2009 at 7:41 PM, David Greendavid.gr...@telus.net wrote: Well, we can encode a URI any way we like -- I was thinking of anything up to the next whitespace or semicolon, and internal semicolons, etc. being %-encoded. Semicolons are reserved characters in URIs: inappropriately percentage encoding semicolons would be in direct violation of rfc3986. Using them as delimiter would break many perfectly valid URIs. That's absolutely a no-go IMNSHO. Breaking up at whitespace should work, see appendix C of RFC 3986 for recommendations on that. Maybe it's more practical to permit only URIs with little to no punctuation to be unquoted, and quote anything else? Not that quoting is such a great hardship anyway Maybe, but if I can't use it half of the time it may as well be omitted. Quoting should be relatively easy, because URIs have a wide range of characters that can't be in them anyway. Leon
Re: Filename literals
I'll just butt in here and say that while the URI format is nice for alternate schemes, it is not nice for accessing files. The general case in most programming languages is to assume that a non-URI file name is local, specifying file://wherever/whatever/filename is unnecessary additional syntax. Also, perhaps only URLs should be permitted; they do after all specify a location. I'm unsure whether this should be part of a central specification to Perl 6 or part of a module. I think I like Hinrik's original proposal. Oh, and regarding file names in Windows, this document should be a pretty definitive guide: http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx -- Jan
Re: Filename literals
Hinrik Örn Sigurðsson wrote: I was wondering if there had been any discussion about how to type file and directory names in Perl 6. I've read a couple of posts about file test operators, where some have suggested making filenames special, either as a subtype of Str or something else entirely. That way Str wouldn't have all these file test methods, which is good because not all strings are valid filenames. snip Considering that in the general case a file name can be any string at all, if it is going to have its own type at all, it should be disjoint from Str in the same manner that, say, Instant and Duration are disjoint from Num/Rat. When I say disjoint, I mean conceptually that FileName say has an attribute of type Str rather than being defined as a subtype of Str. -- Darren Duncan