Re: Filename literals

David Green Tue, 18 Aug 2009 04:20:18 -0700

On 2009-Aug-17, at 8:36 am, Jon Lang wrote:

Timothy S. Nelson wrote:
Well, my main thought in this context is that the stuff thatcan be
done to the inside of a file can also be done to other streams -- TCP
sockets for example (I know, there are differences, but the two area lotthe same), whereas metadata makes less sense in the context of TCPsockets;

But any IO object might have metadata; some different from themetadata you traditionally get with files, and some the same, e.g.$io.size, $io.times{modified}, $io.charset, $io.type.

if (path{/path/to/file}.e) {
       @lines = slurp(path{/path/to/file});
}
(I'm using one of David's suggested syntaxes above, but I'mnot
closely attached to it).

I suggested variations along the line of: io "/path/to/file". Itamounts to much the same thing, but it's important conceptually todistinguish a pathname from the thing it names. (A path doesn't havea modification date, a file does.) Also, special quoting/escapingcould apply to other things, not limited to "filenames". That said, Idon't think it's unreasonable to want to combine both operations forbrevity, but the io-constructor should have built-in path parsing, notthe other way around.

I guess what I'm saying here is that I think we can do thethingswithout people having to worry about the objects being separateunless theycare. So, separate objects, but hide it as much as possible. Isthat
something you're fine with?

Yes -- to me that means some class/role that wraps up all the piecestogether, but all the separate components are still there underneath.But I'm not too bothered about how it's implemented as long as it'stransparent for casual use.


    my $file = io p[/some/file];
    my $contents = $file.data;
    my $mod-date = $file.times{modified};
    my $size = $file.size;

Pathnames still are strings, so that's fine. In fact, there aredifferent
As for pathnames being strings, you may be right FSVOstring. ButI'd say that, while they may be strings, they're not Str, but theydo Str

Agreed, pathnames are "almost" strings, but worth distinguishingconceptually. There should be a URL type that does Str.

Actually, there are other differences, like case-insensitivity andillegal chars. Unfortunately, those depend on the given filesystem.As long as you're dealing with one FS at a time, that's OK; itprobably means we have IO::Name::ext3, IO::Name::NTFS, IO::Name::HFS,etc. But what happens when you cross FS-barriers? Does a case-sensitive name match a case-insensitive one? Is filename-equality notcommutative or not transitive? If you're looking for a filename "foo"on Mac/Win, then a file actually called "FOO" matches; but on Unix itwouldn't.

(Actually, Macs can do both IO::Name::HFS::case-insensitive andIO::Name::HFS::case-sensitive. Eek.)

I'd like Perl 6's treatment of filenames to be smart enough thatsmart-matching any of these pairs of "alternative spellings" wouldresult in a successful match. So while I'll agree that filenamesare string-like, I really don't want them to _be_ strings.

Well, the *files* are the same, but the pathnames are different. I'mnot sure whether some differences in "spelling" should be ignored bydefault or not. There are actually several different kinds; S32 has amethod "realpath", but I think "canonical" is a better name, becausealiases can be just as "real" as the canonical path, e.g. a web pagewith multiple addresses. Or hard links rather than soft links --though in that case, there is no one "canonical" path. It may noteven be possible to easily tell if there is one or not.


Some ways in which different paths can be considered equivalent:
    Spelling: C:\PROGRA~1, case-insensitivity
    Simplification: foo/../bar/ to bar/
    Resolution: of symlinks/shortcuts
    Content-wise: hard links/multiple addresses

Depending on the circumstances, you might want any of those to countas the "same" file; or none of them. We'll need methods for each sortof transformation, $path.canonical, $path.normalize, $path.simplify,etc. Two high-level IO objects are "the same", regardless of path, if$file2 =:= $file2 (which might compare inodes, etc.). There should bea way to set what level of sameness applies in a given lexical scope;perhaps the first two listed above are a reasonable default to startwith.

There's something that slightly jars me here... I don't like thequotation returning an IO object.
But doesn't normal quoting return a Str object? And regex quotingreturn an object (Regex? Match? Something, anyway).

Certainly, but a regex doesn't produce a Signature object, say. Idon't object to objects, just to creating objects, then doingsomething with them, then returning another kind of object, andcalling that "parsing". If we're parsing the characters, we shouldend up with an IO::Name. If we end up with an IO::actual-file/stream-whatever, then we should call it something else (like an "ioconstructor").

The difference in our approaches is that you seem keen to integrate
closely the data and the metadata, whereas I'm trying to integratethe paths
and the metadata.

Well, paths are just metadata too, although typically the mostimportant kind. (You could even have an IO without a path or name.)I want a view that integrates all of them, because that's how peopleordinarily think about files, unless they have a specific reason not to.

$ echo $PATH
/home/wayland/local/bin:/usr/global/bin:/usr/local/bin:/bin:/usr/bin:/usr/sbin:/usr/local/sbin:/sbin
       Now, which of these is the path?

Ah, the arguably-poorly-named $PATH is equivocating on the meaning of"path". It's really a path of paths, that is, a search-path of file-paths. In Perl, $*ENV{PATH} should be an array of IO's. In fact,"paths" in Perl should also be arrays -- they'd stringify to a bunchof characters separated by slashes (or something else if the settingssay so), but really a path is a bunch of separate dirs, so perl shouldrepresent them that way. All the easier to do common operations likepopping one or more segments off the end of a path.

if $file ~~ path[./*.txt] {

Is that just a regex, in fact?

       No, we're talking about a globbing sublanguage

Oh, I should have said "Isn't that equivalent to a regex" -- that is,I would consider globs distinct from paths just as regexes aredistinct from strings. I guess that since glob-syntax has far fewerspecial characters than regexes, you could more easily get away withmaking all path-literals globby, but I'd still want to distinguishthem. Again, glob-parsing is not conceptually restricted to filenames(that's just the context we're most familiar with). It would be fineto have Q:glob as a simplified cousin to Q:regex, and it mightoccasionally be useful to use on plain strings.

I was wanting to replace the "glob" language with something morelike XPath, but that idea was vetoed by people who didn't want Tree-related objects to be part of the core, so I'm doing that as alibrary.

I'm all for some tree-related fun(ctions). A tree is basically a hashof hashes, so I'm surprised we don't have a few functions fortraversing them and other very basic hashy concepts. But I would liketo see XPath-type stuff hashed out [pun intended] anyway -- whether itends up in a third-party module or not isn't such a big deal when itcomes to P6, and somebody will have to figure how to do it in aperlish way eventually.

 if $file.type ~~ MIME("text/plain") {...}

Cool idea.  How would the type be determined?  Are you thinking of
the algorithms in the unix "file" utility?  Please tell me you're not
planning to use filename extentions -- that's bad :).

Wouldn't $file.type be metadata?

Yes; and yes, filename extensions are evil, but of course thanks toprimitive filesystems, we're stuck with them to a large extent. Andthere's no perfect solution, but it would be useful for Perl to stickas closely as the FS/OS's idea of types as it can. Sometimes thatwould mean looking up an extension; it might mean using (or emulating)"file" magic; it might mean querying the FS for a MIME-type or a UTI.After all, the filename extension may not actually match the correcttype of the file.


On 2009-Aug-17, at 6:16 pm, Timothy S. Nelson wrote:

The question is, which of the following does "metadata" mean?:
1       The metadata that the filesystem attaches to the file
2       All the information that can be gathered without opening the file
3 Any information we can gather about the file that isn't the actualdata contained in the file, but may involve reading at least part ofit
4       Something else

Probably all of the above. Strictly speaking, "data" is the contentsof a file, and "metadata" is anything else that relates to the file inany way, including information extrapolated from the contents. A byte-order mark is metadata (it tells you something *about* the file) eventhough it's inside, right? Or a char-set declaration inside an HTMLfile, or -T....

Philosophically, there's no hard distinction; only whatever point ofview is useful for the task at hand. If you "useIO::Filesystem::Gzip" then you should be able to treat "a" gzip fileas a bunch of separate files with separate contents and metadata [eventhough when you look at it from a different perspective, it's all just"data" in a single .gz file].



-David

Re: Filename literals

Reply via email to