Re: [PATCH] File Spec
Lots of good points. Something that the Mac OS (even OS X) has which most Unix variants don't are directory IDs and file IDs. The Carbon APIs use a FSSpec structure, which is a volume ID, directory ID, and file name. (volume ID, file ID is good enough to identify a file which exists already, but each of the volume ID, directory ID, and file name is needed to create a new file.) It's resilient if the directory is moved, but more importantly actually offers very significant performance and memory usage improvements in programs which keep tabs on lots of files (e.g., make). Would be cool if that functionality could be exposed in a portable way, so that parrot programs would inherit it without having to do much. Not that I think it can be. But i would be cool. Java's tackled this. On Unix platforms, Java represents a single volume (/), whereas Classic Mac OS and Windows can have multiple volumes. Mount points are ignoredthey're just directories. Each volume has root directory. Volume names might not be unique (Mac OS)... As for pathname equivalence, There Be Dragons Here. In particular, each directory (when mount points are treated as directories) could potentially have different equivalence semantics. (e.g., on Mac OS X, consider a UFS [ASCII, case sensitive] mount point beneath an HFS+ / [Unicode, case insensitive], visa versa...) And hard links and symlinks... On Wednesday, September 3, 2003, at 09:00 , [EMAIL PROTECTED] wrote: On Mon, 1 Sep 2003, Michael G Schwern wrote: You also must worry about volumes. Unix: No user visible concept of a volume Windows: VOLUME:\dir1\dir2\file VMS: VOLUME:[dir1.dir2]file This has been worrying me for some years. The concept of volume has different implications for different platforms. [please excuse long rambling explanation...] One could argue that the mount points in Unix, though normally invisible, are volumes in the sense that they do affect the semantics of certains system calls, most especially rename and link, but depending on mount options also open, write, ioctl and others. Making them visible is normally exhorbitantly expensive though, so you don't want to do so unless absolutely necessary. It's also clear that the relationships between volume and root directory differ. For Mac, volumes are within a pseudo root directory, whereas for Win32 a root directory exists on a volume. So although they share the same names, they aren't really portable concepts in any meaningful way. What these various OSes do share is a concept of current locus (or loci) within some filename space. * On Unix both the working and root directories can be changed; * On Windows the current (working) directory is a feature of the current volume; changing to another volume and back again will bring you to the same working directory, even if you changed the current directory on another volume. (This behaviour changes between different versions of Windows.) * On Classic Mac (and VMS?) only the working directory can be changed; the root directory is faked to be the top of the startup volume; * On RMX an arbitrary number [*] of current loci can be established, and refered to as if they were independent volumes, or accessed by open handles (much like filedescriptors); the standard C library uses these to fake the behaviour of various POSIX functions, but these loci can be shared between processes and thus the POSIX emulation can be fooled. * Similarly, versions of Unix which have fchdir and/or fchroot allow a working directory or root directory to be selected from an arbitrary number of already-opened directories; * Some (ancient) systems don't have any directory hierachy, so a root directory is meaningless But also importantly, in the general case it is not possible to determine a path between two loci, and in particular between a root directory and a working directory. * In Unices with fchdir to have a current working directory that is outside the current root directory; * Filesystem permissions may prevent traversing from one locus to another; (normally this would prevent construction of a path from one to the other, but even given such a path, it might not be usable) The more important question is how do we interpret these things to decide if certain operations should reasonable be expected to succeed? Give or take ownership issues of course... Some of them we already can do somewhat portably: * How do we take the results of readdir and make them usable? * If we use chdir, how do we later get back to the same working directory? * Is a given filename dependent on the working directory? * Do two pathnames A and B refer to the same entity? Just by inspecting the pathnames? By checking whether they're links to the same file (inode)? * Do two pathnames A and B refer to entities in the same directory? If so then we can assume that if
Re: [PATCH] File Spec
On Thu, 4 Sep 2003 [EMAIL PROTECTED] wrote: On Mon, 1 Sep 2003, Michael G Schwern wrote: You also must worry about volumes. [my long explanation snipped] Sorry, wrong list; this is a standard-module issue, not an implementation issue or even a core-language issue. -Martin
Re: [PATCH] File Spec
On Mon, 1 Sep 2003, Michael G Schwern wrote: You also must worry about volumes. Unix: No user visible concept of a volume Windows: VOLUME:\dir1\dir2\file VMS: VOLUME:[dir1.dir2]file This has been worrying me for some years. The concept of volume has different implications for different platforms. [please excuse long rambling explanation...] One could argue that the mount points in Unix, though normally invisible, are volumes in the sense that they do affect the semantics of certains system calls, most especially rename and link, but depending on mount options also open, write, ioctl and others. Making them visible is normally exhorbitantly expensive though, so you don't want to do so unless absolutely necessary. It's also clear that the relationships between volume and root directory differ. For Mac, volumes are within a pseudo root directory, whereas for Win32 a root directory exists on a volume. So although they share the same names, they aren't really portable concepts in any meaningful way. What these various OSes do share is a concept of current locus (or loci) within some filename space. * On Unix both the working and root directories can be changed; * On Windows the current (working) directory is a feature of the current volume; changing to another volume and back again will bring you to the same working directory, even if you changed the current directory on another volume. (This behaviour changes between different versions of Windows.) * On Classic Mac (and VMS?) only the working directory can be changed; the root directory is faked to be the top of the startup volume; * On RMX an arbitrary number [*] of current loci can be established, and refered to as if they were independent volumes, or accessed by open handles (much like filedescriptors); the standard C library uses these to fake the behaviour of various POSIX functions, but these loci can be shared between processes and thus the POSIX emulation can be fooled. * Similarly, versions of Unix which have fchdir and/or fchroot allow a working directory or root directory to be selected from an arbitrary number of already-opened directories; * Some (ancient) systems don't have any directory hierachy, so a root directory is meaningless But also importantly, in the general case it is not possible to determine a path between two loci, and in particular between a root directory and a working directory. * In Unices with fchdir to have a current working directory that is outside the current root directory; * Filesystem permissions may prevent traversing from one locus to another; (normally this would prevent construction of a path from one to the other, but even given such a path, it might not be usable) The more important question is how do we interpret these things to decide if certain operations should reasonable be expected to succeed? Give or take ownership issues of course... Some of them we already can do somewhat portably: * How do we take the results of readdir and make them usable? * If we use chdir, how do we later get back to the same working directory? * Is a given filename dependent on the working directory? * Do two pathnames A and B refer to the same entity? Just by inspecting the pathnames? By checking whether they're links to the same file (inode)? * Do two pathnames A and B refer to entities in the same directory? If so then we can assume that if permissions allow us to access A then they will probably also allow us to access B. Not that we shouldn't check the results of both attempts of course, but if one succeeds and the other fails then we would be excused for just bailing instead of trying harder. Some of them are a lot harder to do portably: * Can we rename a file from name A to name B? A directory? If it's one that we just created? One that we got from readdir? How can we construct A from B or B from A to guarantee that we can? Roughly this translates to are A and B on the same volume? unless you're on Unix where we pretend that there aren't any volumes... * How do we do transactional file replacement? That is, either replace a target file with a complete replacement, or not at all. On Unix we do this by creating a temporary file in the same directory and once it has been completely written, renaming it to replace the target atomically. Or just deleting it to roll back the transaction. Assuming this method is possible for another OS, how do we construct the temporary filename from the target filename? * Can we create a hard link from name A to name B? A symbolic link? How can we construct A from B or B from A to guarantee that we can? Given two pathnames A and B, how do we make the shortest relative path C between them (to use for a relative symbolic link)? On Unix you can create a hard link anywhere under the same mount
Re: [PATCH] File Spec
[EMAIL PROTECTED] [EMAIL PROTECTED] wrote: [ snipped a lot of explanations ] Please keep in mind, that the intended usage inside Parrot just should be to locate some standard include or extension files for Parrot internals. More abstraction and complexity can always be added above that or implemented by HLLs. leo
Re: [PATCH] File Spec
Leopold Toetsch wrote: [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: [ snipped a lot of explanations ] Please keep in mind, that the intended usage inside Parrot just should be to locate some standard include or extension files for Parrot internals. More abstraction and complexity can always be added above that or implemented by HLLs. leo Is there a plan for operating systems without Unix-like hierarchical directory structures (eg IBM I-Series, I think z/OS, I'd assume many other enterprise OSs)? There are further difficulties in that some of these have multiple filesystems which look totally different from each other etc. In general how much effort is it likely to be to get Parrot working on systems which don't look at all like Unix? I've tried to get Perl 5 to build on os/400 before and it wasn't a pleasant experience. Any chance it'll be easier to port Parrot? Chris
Re: [PATCH] File Spec
Though I haven't been following this thread, it seems you're coming up with some File::Spec-like thing for Parrot? Exactly. I'd recommend looking at Ken Williams' excellent Path::Class module Surely, I will. So yes, you must distinguish between concatenating directories and files. You also must worry about volumes. Yeah .. I'll consider that. Tanks alot, Michael
Re: [PATCH] File Spec
Though I haven't been following this thread, it seems you're coming up with some File::Spec-like thing for Parrot? I'd recommend looking at Ken Williams' excellent Path::Class module which gives you actual file and directory objects. EXTREMELY useful when you're in an ultra-cross platform environment such as Parrot. I wish I had them for MakeMaker instead of fucking around with File::Spec. Consider using Path::Class for inspiration rather than File::Spec. On Mon, Sep 01, 2003 at 02:38:36PM +0300, Vladimir Lipskiy wrote: Leo wrote: Albeit File::Spec is using catfile and catdir, I don't like the function names (cat file is on *nix what type file is on Win*). Maybe concat_pathname and concat_filename is better. Yes, indeed. I'm for having concat_pathname only since this patch or the File::Spec module makes no difference when concatenates paths and files (though I can be mistaken on account of VMS, Dan? (~:). So catdir and catfile give the same result. Morever, catfile is sort of a wrapper around catdir and does nothing smarter than just calling catdir on all platforms. On VMS catfile and catdir do very different things because VMS filepath syntax distinguishs between files and directories explicitly. Unix: /dir1/dir2/dir3 /dir1/dir2/file Windows: \dir1\dir2\dir3 \dir1\dir2\file VMS: [dir1.dir2.dir3] [dir1.dir2]file So yes, you must distinguish between concatenating directories and files. You also must worry about volumes. Unix: No user visible concept of a volume Windows: VOLUME:\dir1\dir2\file VMS: VOLUME:[dir1.dir2]file -- Michael G Schwern[EMAIL PROTECTED] http://www.pobox.com/~schwern/ Operation Thrusting Peach
Re: [PATCH] File Spec
Vladimir Lipskiy [EMAIL PROTECTED] wrote: [ my first answer seems to be missing ] From: Leopold Toetsch [EMAIL PROTECTED] Subject: TWEAKS: Takers Wanted - Effort And Knowledge Sought Platform code - We need some functions to deal with paths and files like File::Spec. For loading include files or runtime extension some search path should be available to locate these files (a la use lib LIST;). For now runtime/parrot/{include,dynext} and the current working directory would be sufficient. I ain't 100% sure what Leo wanted there and afraid that my patch is out of place. Though it presets rudimentary support for the Parrot File::Spec-like functions which are as follows: curdir, catdir, catfile. Albeit File::Spec is using catfile and catdir, I don't like the function names (cat file is on *nix what type file is on Win*). Maybe concat_pathname and concat_filename is better. I should warn you the patch is a lack of any documentation. Examples of usage can be found in file_spec.t. Nevetheless does it need writing some documentation on for non-perl folks and if it does where should I put it in? The docs directory? docs/dev is the place for documents about internal functionality and design decisions. WRT the patch - please can people having experience with different platforms have a look at it, if the functionality would be able to cope with all platform weirdness. =3Dhead1 NAME [ can you switch your mailer to plain text, thanks ] [ WRT diff: make a copy of your original tree, do modifications there and then cd ..; diff -urN parrot parrot-modified ] Thanks, leo
Re: [PATCH] File Spec
Leo wrote: Albeit File::Spec is using catfile and catdir, I don't like the function names (cat file is on *nix what type file is on Win*). Maybe concat_pathname and concat_filename is better. Yes, indeed. I'm for having concat_pathname only since this patch or the File::Spec module makes no difference when concatenates paths and files (though I can be mistaken on account of VMS, Dan? (~:). So catdir and catfile give the same result. Morever, catfile is sort of a wrapper around catdir and does nothing smarter than just calling catdir on all platforms. We can bring concat_filename in either (I don't mind) but as an alias of concat_pathname. I don't know how to implement this(I mean aliasing) in terms of parrot, though. Can we do it in some elegant way? However, for consistensy's sakes, I really really want that we have only concat_pathname, because whether we do concatenating of dirs or dirs file we always do the same -- concatenate a path. docs/dev is the place for documents about internal functionality and design decisions. Okay. WRT the patch - please can people having experience with different platforms have a look at it, if the functionality would be able to cope with all platform weirdness. The time being, it can works properly only on windows and unix platforms. Why is it so? I feel I should give some explanations on how it works. There is only one generic function catdir, but not many ones as we have in File::Spec. And there are some filters[1], which we can assign to an array Filters. typedef void (*ParrotFSFilter)(struct Parrot_Interp *, STRING **); ParrotFSFilter Filters[] = { filter_1, filter_2, ... , filter_n }; When we have such a PASM code as set S0, foo_dir set S1, bar_dir catdir S0, S1 it firstly calls the file_spec_catdir() function which just only glues parts with an OS specific directory separator and directs the control to another function, that is file_spec_filter(). No doubt after the gluing a path can contain some trash like successive slashes, that's why we call file_spec_filter, anyway, which in its turn calls each function registered on the Filters array. Filters could be an OS specific, there is no sense to register filter that does the # xx///xx -xx/xx changes when you are working on cygwin. Another question is how we can add an OS specific filter -- it's nothing to do: ParrotFSFilter Filters[] = { file_spec_some_filter #ifndef PARROT_OS_NAME_IS_CYGWIN file_spec_successive_slashes_filter, #endif file_spec_filter_which_deletes_redundant_root_direct #ifdef UNIX file_spec_vms_specific_filter, #endif file_spec_yet_another_filter, and so on }; If somebody imagines a plan that could manage without macroing, you know, ideas are always welcome. Now, when you know how it's supposed to work, I can return to the question why can it works properly only on windows and unix platforms. The answer is: Filters haven't been implemented yet. Because I am still hesitating on accounts of what would be the best solution for find 'n' search actions. And wish I could have heard some comments on that. To clarify what the heck I'm talknig about I put the following fragment that I have cut off of my inital mail Next. In the future I'll need to be able to do some find 'n' replace actions in order to clean the trash off of paths. The perl version uses the regexes like these: $path =~ s|/+|/|g unless($^O eq 'cygwin'); # xxxx - xx/xx $path =~ s|(/\.)+/|/|g; # xx/././xx -xx/xx $path =~ s|^(\./)+||s unless $path eq ./; # ./xx - xx $path =~ s|^/(\.\./)+|/|s; # /../../xx -xx $path =~ s|/\Z(?!\n)|| unless $path eq /;# xx/ - xx The bodkin is whether I should take advantage of string_str_index, string_replace and friends or there is a better solution? In any case it never uses long paths, so we won't be violently penalized while using any of find 'n' replace sheme. There is one more thing to have been said, for some cases a result obtained with the parrot file spec will devirege from a result obtained with the perl one. For instance, set S0, set S1, concat_pathname S0, S1 print S1 prints , but File::Spec's equivalent my $path = catdir(, ); print $path; prints / on UNIX, windows, and so forth. I don't think it's the Right result, though you can argue with me on that account. I'm gonna document all divegrences. [ can you switch your mailer to plain text, thanks ] Yep. I regularly do that. But sometimes my MTA outwits me. [ WRT diff: make a copy of your original tree, do modifications there and then cd ..; diff -urN parrot parrot-modified ] Thanks, indeed. I'll try that as soon as I prepare a new patch.
Re: [PATCH] File Spec
Leo wrote: Albeit File::Spec is using catfile and catdir, I don't like the function names (cat file is on *nix what type file is on Win*). Maybe concat_pathname and concat_filename is better. Yes, indeed. I'm for having concat_pathname only since this patch or the File::Spec module makes no difference when concatenates paths and files (though I can be mistaken on account of VMS, Dan? (~:). So catdir and catfile give the same result. Morever, catfile is sort of a wrapper around catdir and does nothing smarter than just calling catdir on all platforms. We can bring concat_filename in either (I don't mind) but as an alias of concat_pathname. I don't know how to implement this(I mean aliasing) in terms of parrot, though. Can we do it in some elegant way? However, for consistensy's sakes, I really really want that we have only concat_pathname, because whether we do concatenating of dirs or dirs file we always do the same -- concatenate a path. docs/dev is the place for documents about internal functionality and design decisions. Okay. WRT the patch - please can people having experience with different platforms have a look at it, if the functionality would be able to cope with all platform weirdness. The time being, it can works properly only on windows and unix platforms. Why is it so? I feel I should give some explanations on how it works. There is only one generic function catdir, but not many ones as we have in File::Spec. And there are some filters[1], which we can assign to an array Filters. typedef void (*ParrotFSFilter)(struct Parrot_Interp *, STRING **); ParrotFSFilter Filters[] = { filter_1, filter_2, ... , filter_n }; When we have such a PASM code as set S0, foo_dir set S1, bar_dir catdir S0, S1 it firstly calls the file_spec_catdir() function which just only glues parts with an OS specific directory separator and directs the control to another function, that is file_spec_filter(). No doubt after the gluing a path can contain some trash like successive slashes, that's why we call file_spec_filter, anyway, which in its turn calls each function registered on the Filters array. Filters could be an OS specific, there is no sense to register filter that does the # xx///xx -xx/xx changes when you are working on cygwin. Another question is how we can add an OS specific filter -- it's nothing to do: ParrotFSFilter Filters[] = { file_spec_some_filter #ifndef PARROT_OS_NAME_IS_CYGWIN file_spec_successive_slashes_filter, #endif file_spec_filter_which_deletes_redundant_root_direct #ifdef UNIX file_spec_vms_specific_filter, #endif file_spec_yet_another_filter, and so on }; If somebody imagines a plan that could manage without macroing, you know, ideas are always welcome. Now, when you know how it's supposed to work, I can return to the question why can it works properly only on windows and unix platforms. The answer is: Filters haven't been implemented yet. Because I am still hesitating on accounts of what would be the best solution for find 'n' search actions. And wish I could have heard some comments on that. To clarify what the heck I'm talknig about I put the following fragment that I have cut off of my inital mail Next. In the future I'll need to be able to do some find 'n' replace actions in order to clean the trash off of paths. The perl version uses the regexes like these: $path =~ s|/+|/|g unless($^O eq 'cygwin'); # xxxx - xx/xx $path =~ s|(/\.)+/|/|g; # xx/././xx -xx/xx $path =~ s|^(\./)+||s unless $path eq ./; # ./xx - xx $path =~ s|^/(\.\./)+|/|s; # /../../xx -xx $path =~ s|/\Z(?!\n)|| unless $path eq /;# xx/ - xx The bodkin is whether I should take advantage of string_str_index, string_replace and friends or there is a better solution? In any case it never uses long paths, so we won't be violently penalized while using any of find 'n' replace sheme. There is one more thing to have been said, for some cases a result obtained with the parrot file spec will devirege from a result obtained with the perl one. For instance, set S0, set S1, concat_pathname S0, S1 print S1 prints , but File::Spec's equivalent my $path = catdir(, ); print $path; prints / on UNIX, windows, and so forth. I don't think it's the Right result, though you can argue with me on that account. I'm gonna document all divegrences. [ can you switch your mailer to plain text, thanks ] Yep. I regularly do that. But sometimes my MTA outwits me. [ WRT diff: make a copy of your original tree, do modifications there and then cd ..; diff -urN parrot parrot-modified ] Thanks, indeed. I'll try that as soon as I prepare a new patch.
[PATCH] File Spec
- Original Message - From: Leopold Toetsch [EMAIL PROTECTED] Sent: Thursday, August 07, 2003 12:51 PM Subject: TWEAKS: Takers Wanted - Effort And Knowledge Sought Platform code - We need some functions to deal with paths and files like File::Spec. For loading include files or runtime extension some search path should be available to locate these files (a la use lib LIST;). For now runtime/parrot/{include,dynext} and the current working directory would be sufficient. I ain't 100% sure what Leo wanted there and afraid that my patch is out of place. Though it presets rudimentary support for the Parrot File::Spec-like functions which are as follows: curdir, catdir, catfile. I should warn you the patch is a lack of any documentation. Examples of usage can be found in file_spec.t. Nevetheless does it need writing some documentation on for non-perl folks and if it does where should I put it in? The docs directory? Next. In the future I'll need to be able to do some find 'n' replace actions in order to clean the trash off of paths. The perl version uses the regexes like these: $path =~ s|/+|/|g unless($^O eq 'cygwin'); # xxxx - xx/xx $path =~ s|(/\.)+/|/|g; # xx/././xx -xx/xx $path =~ s|^(\./)+||s unless $path eq ./; # ./xx - xx $path =~ s|^/(\.\./)+|/|s; # /../../xx -xx $path =~ s|/\Z(?!\n)|| unless $path eq /;# xx/ - xx The bodkin is whether I should take advantage of string_str_index, string_replace and the rest Co or there is a better solution? In any case it never uses long paths, so we won't be violently penalized while using any of find 'n' replace sheme. The last. I beg to be excused I couldn't prepare unified diffs of file.ops, file_spec.c, file_spec.h, and file_spec.t with diff -N -u. Alas. The better I got was: cvs server: I know nothing about file.ops cvs server: I know nothing about file_spec.c cvs server: I know nothing about include/parrot/file_spec.h cvs server: I know nothing about t/op/file_spec.t Probably -N works only with files that have already been added or removed and I have no write access to add those files to the repository. I won't be surprised if oops! I did something wrong again. Comments, requests, threats are welcome, you know. file_spec.diff Description: Binary data file.ops Description: Binary data file_spec.c Description: Binary data file_spec.h Description: Binary data file_spec.t Description: Binary data