Re: [PATCH] File Spec

Gordon Henriksen Sat, 06 Sep 2003 08:48:47 -0700

Lots of good points.

Something that the Mac OS (even OS X) has which most Unix variants don't are directory IDs and file IDs. The Carbon APIs use a FSSpec structure, which is a volume ID, directory ID, and file name. (volume ID, file ID is good enough to identify a file which exists already, but each of the volume ID, directory ID, and file name is needed to create a new file.) It's resilient if the directory is moved, but more importantly actually offers very significant performance and memory usage improvements in programs which keep tabs on lots of files (e.g., make). Would be cool if that functionality could be exposed in a portable way, so that parrot programs would inherit it without having to do much. Not that I think it can be. But i would be cool.

Java's tackled this. On Unix platforms, Java represents a single volume (/), whereas Classic Mac OS and Windows can have multiple volumes. Mount points are ignored—they're just directories. Each volume has root directory. Volume names might not be unique (Mac OS)...

As for pathname equivalence, There Be Dragons Here. In particular, each directory (when mount points are treated as directories) could potentially have different equivalence semantics. (e.g., on Mac OS X, consider a UFS [ASCII, case sensitive] mount point beneath an HFS+ / [Unicode, case insensitive], visa versa...) And hard links and symlinks...

On Wednesday, September 3, 2003, at 09:00 , [EMAIL PROTECTED] wrote:

On Mon, 1 Sep 2003, Michael G Schwern wrote:
You also must worry about volumes.
Unix: No user visible concept of a volume
Windows: VOLUME:\dir1\dir2\file
VMS: VOLUME:[dir1.dir2]file
This has been worrying me for some years. The concept of "volume" has
different implications for different platforms.
[please excuse long rambling explanation...]

One could argue that the mount points in Unix, though normally invisible, are "volumes" in the sense that they do affect the semantics of certains system calls, most especially "rename" and "link", but depending on mount options also "open", "write", "ioctl" and others. Making them visible is normally exhorbitantly expensive though, so you don't want to do so unless absolutely necessary.

It's also clear that the relationships between "volume" and "root directory" differ. For Mac, volumes are within a pseudo root directory, whereas for Win32 a root directory exists on a volume. So although they share the same names, they aren't really portable concepts in any meaningful way.

What these various OSes do share is a concept of "current locus" (or loci) within some filename space.

* On Unix both the "working" and "root" directories can be changed;

* On Windows the current (working) directory is a feature of the current volume; changing to another volume and back again will bring you to the same working directory, even if you changed the current directory on another volume. (This behaviour changes between different versions of Windows.)

* On Classic Mac (and VMS?) only the "working directory" can be changed; the "root directory" is faked to be the top of the startup volume;

* On RMX an arbitrary number [*] of "current loci" can be established, and refered to as if they were independent volumes, or accessed by open "handles" (much like filedescriptors); the standard C library uses these to fake the behaviour of various POSIX functions, but these loci can be shared between processes and thus the POSIX emulation can be fooled.

* Similarly, versions of Unix which have "fchdir" and/or "fchroot" allow a working directory or root directory to be selected from an arbitrary number of already-opened directories;
  * Some (ancient) systems don't have any directory hierachy, so a root
    directory is meaningless
But also importantly, in the general case it is not possible to determine a path between two loci, and in particular between a root directory and a working directory.

* In Unices with "fchdir" to have a current working directory that is outside the current root directory;

* Filesystem permissions may prevent traversing from one locus to another; (normally this would prevent construction of a path from one to the other, but even given such a path, it might not be usable)

The more important question is how do we interpret these things to decide if certain operations should reasonable be expected to succeed? Give or take ownership issues of course...

Some of them we already can do somewhat portably:

* How do we take the results of "readdir" and make them usable?

* If we use "chdir", how do we later get back to the same working directory?

* Is a given filename dependent on the working directory?
  * Do two pathnames A and B refer to the same entity?
    Just by inspecting the pathnames?
    By checking whether they're links to the same file (inode)?
* Do two pathnames A and B refer to entities in the same directory?

If so then we can assume that if permissions allow us to access A then they will probably also allow us to access B. Not that we shouldn't check the results of both attempts of course, but if one succeeds and the other fails then we would be excused for just bailing instead of trying harder.

Some of them are a lot harder to do portably:
  * Can we rename a file from name A to name B? A directory?
    If it's one that we just created? One that we got from "readdir"?
How can we construct A from B or B from A to guarantee that we can?
    Roughly this translates to "are A and B on the same volume?" unless
    you're on Unix where we pretend that there aren't any volumes...
  * How do we do transactional file replacement? That is, either replace
    a target file with a complete replacement, or not at all.
On Unix we do this by creating a temporary file in the same directory and once it has been completely written, renaming it to replace the target atomically. Or just deleting it to roll back the transaction.

Assuming this method is possible for another OS, how do we construct the temporary filename from the target filename?

* Can we create a hard link from name A to name B? A symbolic link?

How can we construct A from B or B from A to guarantee that we can?

Given two pathnames A and B, how do we make the shortest relative path C between them (to use for a relative symbolic link)?

On Unix you can create a hard link anywhere under the same mount point; on Win-NT4-POSIX links can only be created within the same directory.

* If we rename a symlink from name A and pointing at B, to name C, will it still refer to the same file?

How can we construct A+B from C or C from A+B to guarantee that it will?

If we can't, how do we create a new symlink D that *will* refer to the same file? Or a new name E which it will refer to?

And do all the above without requiring A and C to be in the same directory?

I would strongly recommend deprecating any distinction between "volume" and "path", and instead provide functions which focus on allowing us to answer the above questions in simple portable ways.

But in the end, since Windows, MacOS, VMS and even RMX all have POSIX emulation, do we really care? Maybe we should just have functions for "convert native name to POSIX" and "convert POSIX name to native" and be done with it?

-Martin

[* Ok, an "arbitrary" number really means a 32-bit number -- or smaller]

PS: don't forget, I said "give or take filesystem permissions"

—

Gordon Henriksen
[EMAIL PROTECTED]

Re: [PATCH] File Spec

Reply via email to