Re: [PATCH] File Spec

2003-09-06 Thread Gordon Henriksen
Lots of good points.

Something that the Mac OS (even OS X) has which most Unix variants don't 
are directory IDs and file IDs. The Carbon APIs use a FSSpec structure, 
which is a volume ID, directory ID, and file name. (volume ID, file ID 
is good enough to identify a file which exists already, but each of the 
volume ID, directory ID, and file name is needed to create a new file.) 
It's resilient if the directory is moved, but more importantly actually 
offers very significant performance and memory usage improvements in 
programs which keep tabs on lots of files (e.g., make). Would be cool if 
that functionality could be exposed in a portable way, so that parrot 
programs would inherit it without having to do much. Not that I think it 
can be. But i would be cool.

Java's tackled this. On Unix platforms, Java represents a single 
volume (/), whereas Classic Mac OS and Windows can have multiple 
volumes. Mount points are ignoredthey're just directories. Each volume 
has  root directory. Volume names might not be unique (Mac OS)...

As for pathname equivalence, There Be Dragons Here. In particular, each 
directory (when mount points are treated as directories) could 
potentially have different equivalence semantics. (e.g., on Mac OS X, 
consider a UFS [ASCII, case sensitive] mount point beneath an HFS+ / 
[Unicode, case insensitive], visa versa...) And hard links and 
symlinks...

On Wednesday, September 3, 2003, at 09:00 , [EMAIL PROTECTED] 
wrote:

On Mon, 1 Sep 2003, Michael G Schwern wrote:
You also must worry about volumes.
Unix: No user visible concept of a volume
Windows: VOLUME:\dir1\dir2\file
VMS: VOLUME:[dir1.dir2]file
This has been worrying me for some years. The concept of volume has
different implications for different platforms.
[please excuse long rambling explanation...]

One could argue that the mount points in Unix, though normally 
invisible,
are volumes in the sense that they do affect the semantics of certains
system calls, most especially rename and link, but depending on 
mount
options also open, write, ioctl and others. Making them visible is
normally exhorbitantly expensive though, so you don't want to do so 
unless
absolutely necessary.

It's also clear that the relationships between volume and root 
directory
differ. For Mac, volumes are within a pseudo root directory, whereas 
for Win32
a root directory exists on a volume. So although they share the same 
names,
they aren't really portable concepts in any meaningful way.

What these various OSes do share is a concept of current locus (or 
loci)
within some filename space.

  * On Unix both the working and root directories can be changed;

  * On Windows the current (working) directory is a feature of the 
current
volume; changing to another volume and back again will bring you to 
the
same working directory, even if you changed the current directory on
another volume.
(This behaviour changes between different versions of Windows.)

  * On Classic Mac (and VMS?) only the working directory can be 
changed; the
root directory is faked to be the top of the startup volume;

  * On RMX an arbitrary number [*] of current loci can be 
established, and
refered to as if they were independent volumes, or accessed by open
handles (much like filedescriptors); the standard C library uses 
these to
fake the behaviour of various POSIX functions, but these loci can be
shared between processes and thus the POSIX emulation can be fooled.

  * Similarly, versions of Unix which have fchdir and/or fchroot 
allow a
working directory or root directory to be selected from an 
arbitrary number
of already-opened directories;

  * Some (ancient) systems don't have any directory hierachy, so a root
directory is meaningless
But also importantly, in the general case it is not possible to 
determine a
path between two loci, and in particular between a root directory and a 
working
directory.

  * In Unices with fchdir to have a current working directory that is 
outside
the current root directory;

  * Filesystem permissions may prevent traversing from one locus to 
another;
(normally this would prevent construction of a path from one to the 
other,
but even given such a path, it might not be usable)

The more important question is how do we interpret these things to 
decide if
certain operations should reasonable be expected to succeed? Give or 
take
ownership issues of course...

Some of them we already can do somewhat portably:

  * How do we take the results of readdir and make them usable?

  * If we use chdir, how do we later get back to the same working 
directory?

  * Is a given filename dependent on the working directory?

  * Do two pathnames A and B refer to the same entity?
Just by inspecting the pathnames?
By checking whether they're links to the same file (inode)?
  * Do two pathnames A and B refer to entities in the same directory?

If so then we can assume that if 

Re: [PATCH] File Spec

2003-09-05 Thread martin
On Thu, 4 Sep 2003 [EMAIL PROTECTED] wrote:
 On Mon, 1 Sep 2003, Michael G Schwern wrote:
  You also must worry about volumes.
[my long explanation snipped]

Sorry, wrong list; this is a standard-module issue, not an implementation
issue or even a core-language issue.

-Martin




Re: [PATCH] File Spec

2003-09-04 Thread martin
On Mon, 1 Sep 2003, Michael G Schwern wrote:
 You also must worry about volumes.
 Unix: No user visible concept of a volume
 Windows: VOLUME:\dir1\dir2\file
 VMS: VOLUME:[dir1.dir2]file

This has been worrying me for some years. The concept of volume has
different implications for different platforms.

[please excuse long rambling explanation...]

One could argue that the mount points in Unix, though normally invisible,
are volumes in the sense that they do affect the semantics of certains
system calls, most especially rename and link, but depending on mount
options also open, write, ioctl and others. Making them visible is
normally exhorbitantly expensive though, so you don't want to do so unless
absolutely necessary.

It's also clear that the relationships between volume and root directory
differ. For Mac, volumes are within a pseudo root directory, whereas for Win32
a root directory exists on a volume. So although they share the same names,
they aren't really portable concepts in any meaningful way.

What these various OSes do share is a concept of current locus (or loci)
within some filename space.

  * On Unix both the working and root directories can be changed;

  * On Windows the current (working) directory is a feature of the current
volume; changing to another volume and back again will bring you to the
same working directory, even if you changed the current directory on
another volume.
(This behaviour changes between different versions of Windows.)

  * On Classic Mac (and VMS?) only the working directory can be changed; the
root directory is faked to be the top of the startup volume;

  * On RMX an arbitrary number [*] of current loci can be established, and
refered to as if they were independent volumes, or accessed by open
handles (much like filedescriptors); the standard C library uses these to
fake the behaviour of various POSIX functions, but these loci can be
shared between processes and thus the POSIX emulation can be fooled.

  * Similarly, versions of Unix which have fchdir and/or fchroot allow a
working directory or root directory to be selected from an arbitrary number
of already-opened directories;

  * Some (ancient) systems don't have any directory hierachy, so a root
directory is meaningless

But also importantly, in the general case it is not possible to determine a
path between two loci, and in particular between a root directory and a working
directory.

  * In Unices with fchdir to have a current working directory that is outside
the current root directory;

  * Filesystem permissions may prevent traversing from one locus to another;
(normally this would prevent construction of a path from one to the other,
but even given such a path, it might not be usable)

The more important question is how do we interpret these things to decide if
certain operations should reasonable be expected to succeed? Give or take
ownership issues of course...

Some of them we already can do somewhat portably:

  * How do we take the results of readdir and make them usable?

  * If we use chdir, how do we later get back to the same working directory?

  * Is a given filename dependent on the working directory?

  * Do two pathnames A and B refer to the same entity?
Just by inspecting the pathnames?
By checking whether they're links to the same file (inode)?

  * Do two pathnames A and B refer to entities in the same directory?

If so then we can assume that if permissions allow us to access A then they
will probably also allow us to access B.  Not that we shouldn't check the
results of both attempts of course, but if one succeeds and the other fails
then we would be excused for just bailing instead of trying harder.

Some of them are a lot harder to do portably:

  * Can we rename a file from name A to name B? A directory?
If it's one that we just created? One that we got from readdir?

How can we construct A from B or B from A to guarantee that we can?

Roughly this translates to are A and B on the same volume? unless
you're on Unix where we pretend that there aren't any volumes...

  * How do we do transactional file replacement? That is, either replace
a target file with a complete replacement, or not at all.

On Unix we do this by creating a temporary file in the same directory and
once it has been completely written, renaming it to replace the target
atomically. Or just deleting it to roll back the transaction.

Assuming this method is possible for another OS, how do we construct the
temporary filename from the target filename?

  * Can we create a hard link from name A to name B? A symbolic link?

How can we construct A from B or B from A to guarantee that we can?

Given two pathnames A and B, how do we make the shortest relative path C
between them (to use for a relative symbolic link)?

On Unix you can create a hard link anywhere under the same mount 

Re: [PATCH] File Spec

2003-09-04 Thread Leopold Toetsch
[EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

[ snipped a lot of explanations ]

Please keep in mind, that the intended usage inside Parrot just should
be to locate some standard include or extension files for Parrot
internals. More abstraction and complexity can always be added above
that or implemented by HLLs.

leo


Re: [PATCH] File Spec

2003-09-04 Thread Chris Allan
Leopold Toetsch wrote:
[EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

[ snipped a lot of explanations ]

Please keep in mind, that the intended usage inside Parrot just should
be to locate some standard include or extension files for Parrot
internals. More abstraction and complexity can always be added above
that or implemented by HLLs.
leo


Is there a plan for operating systems without Unix-like hierarchical 
directory structures (eg IBM I-Series, I think z/OS, I'd assume many 
other enterprise OSs)?  There are further difficulties in that some of 
these have multiple filesystems which look totally different from each 
other etc.

In general how much effort is it likely to be to get Parrot working on 
systems which don't look at all like Unix?  I've tried to get Perl 5 to 
build on os/400 before and it wasn't a pleasant experience.  Any chance 
it'll be easier to port Parrot?

Chris



Re: [PATCH] File Spec

2003-09-02 Thread Vladimir Lipskiy
 Though I haven't been following this thread, it seems you're coming up
 with some File::Spec-like thing for Parrot?

Exactly.

 I'd recommend looking at Ken Williams' excellent Path::Class module

Surely, I will.

 So yes, you must distinguish between concatenating directories and files.
 
 You also must worry about volumes.

Yeah .. I'll consider that.

Tanks alot, Michael





Re: [PATCH] File Spec

2003-09-02 Thread Michael G Schwern
Though I haven't been following this thread, it seems you're coming up
with some File::Spec-like thing for Parrot?

I'd recommend looking at Ken Williams' excellent Path::Class module
which gives you actual file and directory objects.  EXTREMELY useful when
you're in an ultra-cross platform environment such as Parrot.  I wish I
had them for MakeMaker instead of fucking around with File::Spec.  Consider
using Path::Class for inspiration rather than File::Spec.


On Mon, Sep 01, 2003 at 02:38:36PM +0300, Vladimir Lipskiy wrote:
 Leo wrote:
  Albeit File::Spec is using catfile and catdir, I don't like the function
  names (cat file is on *nix what type file is on Win*). Maybe
  concat_pathname and concat_filename is better.
 
 Yes, indeed. I'm for having concat_pathname only since this patch or
 the File::Spec module makes no difference when concatenates paths and
 files (though I can be mistaken on account of VMS, Dan? (~:). So catdir
 and catfile give the same result. Morever, catfile is sort of a wrapper
 around
 catdir and does nothing smarter than just calling catdir on all platforms.

On VMS catfile and catdir do very different things because VMS filepath
syntax distinguishs between files and directories explicitly.

Unix:
/dir1/dir2/dir3
/dir1/dir2/file

Windows:
\dir1\dir2\dir3
\dir1\dir2\file

VMS:
[dir1.dir2.dir3]
[dir1.dir2]file

So yes, you must distinguish between concatenating directories and files.

You also must worry about volumes.

Unix:
No user visible concept of a volume

Windows:
VOLUME:\dir1\dir2\file

VMS:
VOLUME:[dir1.dir2]file


-- 
Michael G Schwern[EMAIL PROTECTED]  http://www.pobox.com/~schwern/
Operation Thrusting Peach


Re: [PATCH] File Spec

2003-09-01 Thread Leopold Toetsch
Vladimir Lipskiy [EMAIL PROTECTED] wrote:

[ my first answer seems to be missing ]

 From: Leopold Toetsch [EMAIL PROTECTED]
 Subject: TWEAKS: Takers Wanted - Effort And Knowledge Sought

 Platform code
 -
We need some functions to deal with paths and files like File::Spec.
For loading include files or runtime extension some search path should
be available to locate these files (a la use lib LIST;).
For now runtime/parrot/{include,dynext} and the current working
directory would be sufficient.

 I ain't 100% sure what Leo wanted there and afraid that my patch is out of
 place.  Though it presets rudimentary support for the Parrot File::Spec-like
 functions which are as follows: curdir, catdir, catfile.

Albeit File::Spec is using catfile and catdir, I don't like the function
names (cat file is on *nix what type file is on Win*). Maybe
concat_pathname and concat_filename is better.

 I should warn you the patch is a lack of any documentation. Examples of
 usage can be found in file_spec.t. Nevetheless does it need writing some
 documentation on for non-perl folks and if it does where should I put it in?
 The docs directory?

docs/dev is the place for documents about internal functionality and
design decisions.

WRT the patch - please can people having experience with different
platforms have a look at it, if the functionality would be able to cope
with all platform weirdness.

=3Dhead1 NAME

[ can you switch your mailer to plain text, thanks ]
[ WRT diff: make a copy of your original tree, do modifications there
and then cd ..; diff -urN parrot parrot-modified ]

Thanks,
leo


Re: [PATCH] File Spec

2003-09-01 Thread Vladimir Lipskiy
Leo wrote:
 Albeit File::Spec is using catfile and catdir, I don't like the function
 names (cat file is on *nix what type file is on Win*). Maybe
 concat_pathname and concat_filename is better.

Yes, indeed. I'm for having concat_pathname only since this patch or
the File::Spec module makes no difference when concatenates paths and
files (though I can be mistaken on account of VMS, Dan? (~:). So catdir
and catfile give the same result. Morever, catfile is sort of a wrapper
around
catdir and does nothing smarter than just calling catdir on all platforms.

We can bring concat_filename in either (I don't mind) but as an alias of
concat_pathname. I don't know how to implement this(I mean aliasing)
in terms of parrot, though. Can we do it in some elegant way?

However, for consistensy's sakes, I really really want that we have only
concat_pathname, because whether we do concatenating of dirs or
dirs  file we always do the same -- concatenate a path.

 docs/dev is the place for documents about internal functionality and
 design decisions.

Okay.

 WRT the patch - please can people having experience with different
 platforms have a look at it, if the functionality would be able to cope
 with all platform weirdness.

The time being, it can works properly only on windows and unix platforms.
Why is it so? I feel I should give some explanations on how it works.

There is only one generic function catdir, but not many ones as we have in
File::Spec. And there are some filters[1], which we can assign to an array
Filters.

typedef void (*ParrotFSFilter)(struct Parrot_Interp *, STRING **);

ParrotFSFilter Filters[] = {
 filter_1,
 filter_2,
  ...  ,
 filter_n
};

When we have such a PASM code as

set S0, foo_dir
set S1, bar_dir
catdir S0, S1

it firstly calls the file_spec_catdir() function which just only glues
parts with an OS specific directory separator and directs the control
to another function, that is file_spec_filter(). No doubt after the gluing
a path can contain some trash like successive slashes, that's why we
call file_spec_filter, anyway, which in its turn calls each function
registered
on the Filters array. Filters could be an OS specific, there is no sense
to register filter that does the # xx///xx -xx/xx changes when you are
working on cygwin. Another question is how we can add an OS specific
filter -- it's nothing to do:

ParrotFSFilter Filters[] = {
file_spec_some_filter
#ifndef PARROT_OS_NAME_IS_CYGWIN
 file_spec_successive_slashes_filter,
#endif
file_spec_filter_which_deletes_redundant_root_direct
#ifdef UNIX
file_spec_vms_specific_filter,
#endif
 file_spec_yet_another_filter,
and so on
};

If somebody imagines a plan that could manage without macroing,
you know, ideas are always welcome.

Now, when you know how it's supposed to work, I can return to
the question why can it works properly only on windows and unix
platforms. The answer is: Filters haven't been implemented yet.
Because I am still hesitating on accounts of what would be the best
solution for find 'n' search actions. And wish I could have heard some
comments on that. To clarify what the heck I'm talknig about I put
the following fragment that I have cut off of my inital mail



Next. In the future I'll need to be able to do some find 'n' replace
actions in order to clean the trash off of paths. The perl version
uses the regexes like these:

$path =~ s|/+|/|g unless($^O eq 'cygwin'); # xxxx  - xx/xx
$path =~ s|(/\.)+/|/|g;  # xx/././xx -xx/xx
$path =~ s|^(\./)+||s unless $path eq ./;  # ./xx  - xx
$path =~ s|^/(\.\./)+|/|s;   # /../../xx -xx
$path =~ s|/\Z(?!\n)|| unless $path eq /;# xx/   - xx

The bodkin is whether I should take advantage of string_str_index,
string_replace and friends or there is a better solution? In any
case it never uses long paths, so we won't be violently penalized while
using any of find 'n' replace sheme.



There is one more thing to have been said, for some cases a result obtained
with the parrot file spec will devirege from a result obtained with the perl
one.
For instance,

set S0, 
set S1, 
concat_pathname S0, S1
print S1

prints , but File::Spec's equivalent

my $path = catdir(, );
print $path;

prints / on UNIX, windows, and so forth. I don't think it's the Right
result,
though you can argue with me on that account. I'm gonna document all
divegrences.

 [ can you switch your mailer to plain text, thanks ]

Yep. I regularly do that. But sometimes my MTA outwits me.

 [ WRT diff: make a copy of your original tree, do modifications there
 and then cd ..; diff -urN parrot parrot-modified ]

Thanks, indeed. I'll try that as soon as I prepare a new patch.



Re: [PATCH] File Spec

2003-09-01 Thread Vladimir Lipskiy
Leo wrote:
 Albeit File::Spec is using catfile and catdir, I don't like the function
 names (cat file is on *nix what type file is on Win*). Maybe
 concat_pathname and concat_filename is better.

Yes, indeed. I'm for having concat_pathname only since this patch or
the File::Spec module makes no difference when concatenates paths and
files (though I can be mistaken on account of VMS, Dan? (~:). So catdir
and catfile give the same result. Morever, catfile is sort of a wrapper
around
catdir and does nothing smarter than just calling catdir on all platforms.

We can bring concat_filename in either (I don't mind) but as an alias of
concat_pathname. I don't know how to implement this(I mean aliasing)
in terms of parrot, though. Can we do it in some elegant way?

However, for consistensy's sakes, I really really want that we have only
concat_pathname, because whether we do concatenating of dirs or
dirs  file we always do the same -- concatenate a path.

 docs/dev is the place for documents about internal functionality and
 design decisions.

Okay.

 WRT the patch - please can people having experience with different
 platforms have a look at it, if the functionality would be able to cope
 with all platform weirdness.

The time being, it can works properly only on windows and unix platforms.
Why is it so? I feel I should give some explanations on how it works.

There is only one generic function catdir, but not many ones as we have in
File::Spec. And there are some filters[1], which we can assign to an array
Filters.

typedef void (*ParrotFSFilter)(struct Parrot_Interp *, STRING **);

ParrotFSFilter Filters[] = {
 filter_1,
 filter_2,
  ...  ,
 filter_n
};

When we have such a PASM code as

set S0, foo_dir
set S1, bar_dir
catdir S0, S1

it firstly calls the file_spec_catdir() function which just only glues
parts with an OS specific directory separator and directs the control
to another function, that is file_spec_filter(). No doubt after the gluing
a path can contain some trash like successive slashes, that's why we
call file_spec_filter, anyway, which in its turn calls each function
registered
on the Filters array. Filters could be an OS specific, there is no sense
to register filter that does the # xx///xx -xx/xx changes when you are
working on cygwin. Another question is how we can add an OS specific
filter -- it's nothing to do:

ParrotFSFilter Filters[] = {
file_spec_some_filter
#ifndef PARROT_OS_NAME_IS_CYGWIN
 file_spec_successive_slashes_filter,
#endif
file_spec_filter_which_deletes_redundant_root_direct
#ifdef UNIX
file_spec_vms_specific_filter,
#endif
 file_spec_yet_another_filter,
and so on
};

If somebody imagines a plan that could manage without macroing,
you know, ideas are always welcome.

Now, when you know how it's supposed to work, I can return to
the question why can it works properly only on windows and unix
platforms. The answer is: Filters haven't been implemented yet.
Because I am still hesitating on accounts of what would be the best
solution for find 'n' search actions. And wish I could have heard some
comments on that. To clarify what the heck I'm talknig about I put
the following fragment that I have cut off of my inital mail



Next. In the future I'll need to be able to do some find 'n' replace
actions in order to clean the trash off of paths. The perl version
uses the regexes like these:

$path =~ s|/+|/|g unless($^O eq 'cygwin'); # xxxx  - xx/xx
$path =~ s|(/\.)+/|/|g;  # xx/././xx -xx/xx
$path =~ s|^(\./)+||s unless $path eq ./;  # ./xx  - xx
$path =~ s|^/(\.\./)+|/|s;   # /../../xx -xx
$path =~ s|/\Z(?!\n)|| unless $path eq /;# xx/   - xx

The bodkin is whether I should take advantage of string_str_index,
string_replace and friends or there is a better solution? In any
case it never uses long paths, so we won't be violently penalized while
using any of find 'n' replace sheme.



There is one more thing to have been said, for some cases a result obtained
with the parrot file spec will devirege from a result obtained with the perl
one.
For instance,

set S0, 
set S1, 
concat_pathname S0, S1
print S1

prints , but File::Spec's equivalent

my $path = catdir(, );
print $path;

prints / on UNIX, windows, and so forth. I don't think it's the Right
result,
though you can argue with me on that account. I'm gonna document all
divegrences.

 [ can you switch your mailer to plain text, thanks ]

Yep. I regularly do that. But sometimes my MTA outwits me.

 [ WRT diff: make a copy of your original tree, do modifications there
 and then cd ..; diff -urN parrot parrot-modified ]

Thanks, indeed. I'll try that as soon as I prepare a new patch.




[PATCH] File Spec

2003-08-26 Thread Vladimir Lipskiy
- Original Message -
From: Leopold Toetsch [EMAIL PROTECTED]
Sent: Thursday, August 07, 2003 12:51 PM
Subject: TWEAKS: Takers Wanted - Effort And Knowledge Sought

 Platform code
 -
We need some functions to deal with paths and files like File::Spec.
For loading include files or runtime extension some search path should
be available to locate these files (a la use lib LIST;).
For now runtime/parrot/{include,dynext} and the current working
directory would be sufficient.

I ain't 100% sure what Leo wanted there and afraid that my patch is out of
place.  Though it presets rudimentary support for the Parrot File::Spec-like
functions which are as follows: curdir, catdir, catfile.

I should warn you the patch is a lack of any documentation. Examples of
usage can be found in file_spec.t. Nevetheless does it need writing some
documentation on for non-perl folks and if it does where should I put it in?
The docs directory?

Next. In the future I'll need to be able to do some find 'n' replace
actions in order to clean the trash off of paths. The perl version
uses the regexes like these:

$path =~ s|/+|/|g unless($^O eq 'cygwin'); # xxxx  - xx/xx
$path =~ s|(/\.)+/|/|g;  # xx/././xx -xx/xx
$path =~ s|^(\./)+||s unless $path eq ./;  # ./xx  - xx
$path =~ s|^/(\.\./)+|/|s;   # /../../xx -xx
$path =~ s|/\Z(?!\n)|| unless $path eq /;# xx/   - xx

The bodkin is whether I should take advantage of string_str_index,
string_replace and the rest Co or there is a better solution? In any
case it never uses long paths, so we won't be violently penalized while
using any of find 'n' replace sheme.

The last. I beg to be excused I couldn't prepare unified diffs
of file.ops, file_spec.c, file_spec.h, and file_spec.t with
diff -N -u. Alas. The better I got was:

cvs server: I know nothing about file.ops
cvs server: I know nothing about file_spec.c
cvs server: I know nothing about include/parrot/file_spec.h
cvs server: I know nothing about t/op/file_spec.t

Probably -N works only with files that have already been added
or removed and I have no write access to add those files to
the repository. I won't be surprised if oops! I did something
wrong again.

Comments, requests, threats are welcome, you know.











file_spec.diff
Description: Binary data


file.ops
Description: Binary data


file_spec.c
Description: Binary data


file_spec.h
Description: Binary data


file_spec.t
Description: Binary data