On 2009-Aug-18, at 7:20 am, Timothy S. Nelson wrote:
On Tue, 18 Aug 2009, David Green wrote:
Some ways in which different paths can be considered equivalent:
Spelling: ... Simplification: ... Resolution: ... Content-wise: ...
Ok, my next commit will have "canonpath" (stolen directly from p5's
File::Spec documentation), which will do "No physical check on the
filesystem, but a logical cleanup of a path", and "realpath" (idea
taken from p5's Cwd documentation), which will resolve symlinks,
etc, and provide an absolute path. Oh, and "resolvepath", which
does both. I'm not quite sure I followed all your discussion above
-- have I left something out?
I think there's a difference between "canonical" as in a webpage with
<link rel="canonical">, and "cleanup" as in Windows turning PROGRA~1
into "Program Files". There could also be other types of
normalisation depending on the FS, but we probably shouldn't concern
ourselves with them, other than having some way to get to such native
calls.
Anyway, my assumption is that there should be a number of
comparison options. Since we do Str, we should get string
comparison for free. But I'm expecting other options at other
levels, but have no idea how or what at this point.
As Leon Timmermans keeps reminding us, that really should be delegated
to the OS/FS. I think $file1 =:= $file2 should ask the OS whether it
thinks those are the same item or not (it can check paths, it can
check inodes, whatever is its official way to compare file-thingies).
Similarly, $file1.name === $file2.name should ask the OS whether it
thinks those names mean the same thing. And if you want to compare
the canonical paths or anything else, just say $file1.name.canonical
=== $file2.name.canonical, or use 'eq', or whatever you want to do,
just do it explicitly.
According to my last commit, p{} will return a Path object that
just stores the path, but has methods attached for accessing all the
metadata. But it doesn't do file opening or things like that
(unless you use the :T and :B thingies, which read the first block
and try to guess whether it's text or binary -- these are in Perl 5
too).
There are two things going on here: the user-friendly syntax for
casual use, which we basically agree should be something short and
pithy, although we have but begun to shed this bike, I'm sure.
$file = io "/foo/bar";
$file = p{/foo/bar};
$file = Q:p/foo/bar/;
$file = File("/foo/bar");
However we end up spelling it, we want that to give us unified access
to the separate inside parts:
IO::Data # contents of file
IO::Handle # filehandle for using manually
IO::Metadata
IO::Path
I'm not sure why Path isn't actually just part of IO::Metadata...
maybe it's just handy to have it out on its own because pathnames are
so prominent. In any case, $file.size would just be shorthand for
something like $file.io.metadata{size}. The :T and :B tests probably
ought to be part of IO::Data, since they require opening the file to
look at it; I'd rather put them there (vs. ::Metadata, which is all
"outside" info) since plain ol' $file abstracts over that detail
anyway. You can say $file.r, $file.x, $file.T, $file.B, and not care
where those test live under the hood.
We might actually want to distinguish IO::Metadata::Stat from
IO::Metadata::Xattr or something... but that's probably too FS-
specific. I don't think I mind much whether it's IO::Path or
IO::Metadata::Path, or whether they both as exist as synonyms....
I think we want many of the same things, I'm just expressing them
slightly differently. Let's keep working on this, and hopefully we
end up with something great.
Yes. A great mess! Er, wait, no........
And there's no perfect solution, but it would be useful for Perl to
stick as closely as the FS/OS's idea of types as it can. Sometimes
that would mean looking up an extension; it might mean using (or
emulating) "file" magic; it might mean querying the FS for a MIME-
type or a UTI. After all, the filename extension may not actually
match the correct type of the file.
My suggestion would be that it's an interesting idea, but should
maybe be left to a module, since it's not a small problem. Of
course, I'm happy to be overruled by a higher power :). I'd like
the feature, I'm just unsure it deserved core status.
Well, it's all modules anyway... certainly we'll have to rely on
IO::Filesystem::XXX, but I do think this is another area to defer to
the OS's own type-determining functions rather than try to do it all
internally. What we should have, though, is a standard way to
represent the types in Perl so that users know how to deal with them.
I think roles are the obvious choice: if the OS tells you that a file
is HTML, then $file would do IO::Datatype::HTML, which means in turn
it would also do IO::Datatype::Plaintext, and so on.
Of course, if the OS tells you you've got a file that does
IO::Datatype::Illudium-phosdex, and you want to *do* something with
it, you'll need a module that knows what to do with that kind of
file. Perl by itself knows only how to treat it as a string of raw
bytes. Well, or as plain text. So you can treat your HTML file as
plain text, or you can use HTML::Doc::Tree and treat it as something
fancier.
-David