On 2009-Aug-17, at 8:36 am, Jon Lang wrote:
Timothy S. Nelson wrote:
Well, my main thought in this context is that the stuff that can be
done to the inside of a file can also be done to other streams -- TCP
sockets for example (I know, there are differences, but the two are a lot the same), whereas metadata makes less sense in the context of TCP sockets;

But any IO object might have metadata; some different from the metadata you traditionally get with files, and some the same, e.g. $io.size, $io.times{modified}, $io.charset, $io.type.

if (path{/path/to/file}.e) {
       @lines = slurp(path{/path/to/file});
}
(I'm using one of David's suggested syntaxes above, but I'm not
closely attached to it).

I suggested variations along the line of: io "/path/to/file". It amounts to much the same thing, but it's important conceptually to distinguish a pathname from the thing it names. (A path doesn't have a modification date, a file does.) Also, special quoting/escaping could apply to other things, not limited to "filenames". That said, I don't think it's unreasonable to want to combine both operations for brevity, but the io-constructor should have built-in path parsing, not the other way around.

I guess what I'm saying here is that I think we can do the things without people having to worry about the objects being separate unless they care. So, separate objects, but hide it as much as possible. Is that
something you're fine with?

Yes -- to me that means some class/role that wraps up all the pieces together, but all the separate components are still there underneath. But I'm not too bothered about how it's implemented as long as it's transparent for casual use.

    my $file = io p[/some/file];
    my $contents = $file.data;
    my $mod-date = $file.times{modified};
    my $size = $file.size;


Pathnames still are strings, so that's fine. In fact, there are different
As for pathnames being strings, you may be right FSVO string. But I'd say that, while they may be strings, they're not Str, but they do Str

Agreed, pathnames are "almost" strings, but worth distinguishing conceptually. There should be a URL type that does Str.

Actually, there are other differences, like case-insensitivity and illegal chars. Unfortunately, those depend on the given filesystem. As long as you're dealing with one FS at a time, that's OK; it probably means we have IO::Name::ext3, IO::Name::NTFS, IO::Name::HFS, etc. But what happens when you cross FS-barriers? Does a case- sensitive name match a case-insensitive one? Is filename-equality not commutative or not transitive? If you're looking for a filename "foo" on Mac/Win, then a file actually called "FOO" matches; but on Unix it wouldn't.

(Actually, Macs can do both IO::Name::HFS::case-insensitive and IO::Name::HFS::case-sensitive. Eek.)

I'd like Perl 6's treatment of filenames to be smart enough that smart-matching any of these pairs of "alternative spellings" would result in a successful match. So while I'll agree that filenames are string-like, I really don't want them to _be_ strings.

Well, the *files* are the same, but the pathnames are different. I'm not sure whether some differences in "spelling" should be ignored by default or not. There are actually several different kinds; S32 has a method "realpath", but I think "canonical" is a better name, because aliases can be just as "real" as the canonical path, e.g. a web page with multiple addresses. Or hard links rather than soft links -- though in that case, there is no one "canonical" path. It may not even be possible to easily tell if there is one or not.

Some ways in which different paths can be considered equivalent:
    Spelling: C:\PROGRA~1, case-insensitivity
    Simplification: foo/../bar/ to bar/
    Resolution: of symlinks/shortcuts
    Content-wise: hard links/multiple addresses

Depending on the circumstances, you might want any of those to count as the "same" file; or none of them. We'll need methods for each sort of transformation, $path.canonical, $path.normalize, $path.simplify, etc. Two high-level IO objects are "the same", regardless of path, if $file2 =:= $file2 (which might compare inodes, etc.). There should be a way to set what level of sameness applies in a given lexical scope; perhaps the first two listed above are a reasonable default to start with.

There's something that slightly jars me here... I don't like the quotation returning an IO object.
But doesn't normal quoting return a Str object? And regex quoting return an object (Regex? Match? Something, anyway).

Certainly, but a regex doesn't produce a Signature object, say. I don't object to objects, just to creating objects, then doing something with them, then returning another kind of object, and calling that "parsing". If we're parsing the characters, we should end up with an IO::Name. If we end up with an IO::actual-file/stream- whatever, then we should call it something else (like an "io constructor").

The difference in our approaches is that you seem keen to integrate
closely the data and the metadata, whereas I'm trying to integrate the paths
and the metadata.

Well, paths are just metadata too, although typically the most important kind. (You could even have an IO without a path or name.) I want a view that integrates all of them, because that's how people ordinarily think about files, unless they have a specific reason not to.

$ echo $PATH
/home/wayland/local/bin:/usr/global/bin:/usr/local/bin:/bin:/usr/ bin:/usr/sbin:/usr/local/sbin:/sbin
       Now, which of these is the path?

Ah, the arguably-poorly-named $PATH is equivocating on the meaning of "path". It's really a path of paths, that is, a search-path of file- paths. In Perl, $*ENV{PATH} should be an array of IO's. In fact, "paths" in Perl should also be arrays -- they'd stringify to a bunch of characters separated by slashes (or something else if the settings say so), but really a path is a bunch of separate dirs, so perl should represent them that way. All the easier to do common operations like popping one or more segments off the end of a path.

if $file ~~ path[./*.txt] {
Is that just a regex, in fact?
       No, we're talking about a globbing sublanguage

Oh, I should have said "Isn't that equivalent to a regex" -- that is, I would consider globs distinct from paths just as regexes are distinct from strings. I guess that since glob-syntax has far fewer special characters than regexes, you could more easily get away with making all path-literals globby, but I'd still want to distinguish them. Again, glob-parsing is not conceptually restricted to filenames (that's just the context we're most familiar with). It would be fine to have Q:glob as a simplified cousin to Q:regex, and it might occasionally be useful to use on plain strings.

I was wanting to replace the "glob" language with something more like XPath, but that idea was vetoed by people who didn't want Tree- related objects to be part of the core, so I'm doing that as a library.

I'm all for some tree-related fun(ctions). A tree is basically a hash of hashes, so I'm surprised we don't have a few functions for traversing them and other very basic hashy concepts. But I would like to see XPath-type stuff hashed out [pun intended] anyway -- whether it ends up in a third-party module or not isn't such a big deal when it comes to P6, and somebody will have to figure how to do it in a perlish way eventually.

 if $file.type ~~ MIME("text/plain") {...}
Cool idea.  How would the type be determined?  Are you thinking of
the algorithms in the unix "file" utility?  Please tell me you're not
planning to use filename extentions -- that's bad :).
Wouldn't $file.type be metadata?


Yes; and yes, filename extensions are evil, but of course thanks to primitive filesystems, we're stuck with them to a large extent. And there's no perfect solution, but it would be useful for Perl to stick as closely as the FS/OS's idea of types as it can. Sometimes that would mean looking up an extension; it might mean using (or emulating) "file" magic; it might mean querying the FS for a MIME-type or a UTI. After all, the filename extension may not actually match the correct type of the file.

On 2009-Aug-17, at 6:16 pm, Timothy S. Nelson wrote:
The question is, which of the following does "metadata" mean?:
1       The metadata that the filesystem attaches to the file
2       All the information that can be gathered without opening the file
3 Any information we can gather about the file that isn't the actual data contained in the file, but may involve reading at least part of it
4       Something else


Probably all of the above. Strictly speaking, "data" is the contents of a file, and "metadata" is anything else that relates to the file in any way, including information extrapolated from the contents. A byte- order mark is metadata (it tells you something *about* the file) even though it's inside, right? Or a char-set declaration inside an HTML file, or -T....

Philosophically, there's no hard distinction; only whatever point of view is useful for the task at hand. If you "use IO::Filesystem::Gzip" then you should be able to treat "a" gzip file as a bunch of separate files with separate contents and metadata [even though when you look at it from a different perspective, it's all just "data" in a single .gz file].


-David

Reply via email to