Re: Working with files wish list

Leon Timmermans Mon, 15 Dec 2008 10:58:37 -0800

On Mon, Dec 15, 2008 at 5:43 PM, Richard Hainsworth
<rich...@rusrating.ru> wrote:
> Following the request for ideas on IO, this is my wish list for working with
> files. I am not a perl guru and so I do not claim to be able to write
> specifications. But I do know what I would like.
>
> The organisation of the IO as roles seems to be a great idea. I think that
> what is suggested here would fall in naturally with that idea.
>
> Suggestions:
>
> a) I am fed up with writing something like
>
> open(FP, ">${fname}_out.txt") or die "Cant open ${fname}_out.txt for
> writing\n";
>
> The complex definition of the filename is only to show that it has to be
> restated identically twice.
>
> Since the error code I write (die "blaa") is always the same, surely it can
> be made into a default that reports on what caused the die and hidden away
> as a default pointer to code that can be overridden if the programmer wants
> to.
>


You could think along the lines of

my $fh = open '>', $filename, :errorstring("Could not open %file: %error");

It doesn't repeat itself, but still gives the programmer the chance to
add a helpful message.

> b) Why do I have to 'open' anything? Surely when software first identifies a
> File object (eg., names it) that should be sufficient signal to do all the
> IO things. So, I would love to write
>
> my File $file .= new(:name<mydatafile.txt>);
>
> my File $output .=new(:name<myresults.txt>, :mode<write>);
>
> and then:
>
> while $file.read {…};
>
> or:
>
> say "Hello world" :to<$output>;
>
> The defaults would include error routines that die if errors are
> encountered, read as the default mode, and a text file with EndOfLine
> markers as the file type. Obviously, other behaviours, such as not dying,
> but handling the lack of a file with a request to choose another file, could
> be accommodated by overridding the appropriate role attribute.
>
> The suggestion here is that the method "say" on a File object is provided in
> a role and has some attributes, eg., $.error_code, that can be assigned to
> provide a different behaviour.

open() is an idiom, and not an inappropriate one at that IMHO, it
carries a meaning with it. Even someone who program's another language
will understand what's going on when you say open. When you say new,
that isn't necessarily the case. IMHO the word new focuses to much on
the object, while the resource it holds is far more important.

> c) I want the simplest file names for simple scripts. As Damian Conway has
> pointed out, naming a resource is a can of worms. I work with Cyrillic texts
> and filenames and still have problems with the varieties of char sets.
> Unicode has done a lot, but humans just keep pushing the envelop of what is
> possible. I don't think there will ever be a resolution until humanity has a
> single language and single script.
>
> It seems far better to me for standard resource names to be constrained to
> the simplest possible for 'vanilla' perl scripts, but also to let the
> programmer access the underlying bit/byte string so they can do what they
> want if they understand the environment.
>
> The idea of 'stringification', that is providing to the programmer for use
> inside the program a predictable representation of a complex object, also
> seems to me to be something to exploit. In the case of a resource name, the
> one most easily available to the programmer would be a 'stringified' version
> of the underlying stream of bytes used by the operating system.
>
> Eg. if a File object located in some directory under some OS would have both
> $file.name as a unicode representation and a $file.underlying_name with some
> arbitrary sequence of bits with a semantics known only to the OS (and the
> perl implementation).

We talked about such issues before. Fact is, many unices don't use
Unicode for filenames, but blobs. This means that you can't assume
that filenames will be valid Unicode.

I'm not sure how to solve that cleanly and portably. I suspect there
is no way to do it that is both clean and portable, and we'll have to
choose :-/.

> d) It would be nice to specify filters on the incoming and outgoing data. I
> find I do the following all the time in perl5:
>
> while (<FN>) {chop; …};
>
> So my example above, viz.,
>
> while $file.read { … };
>
> would automatically provide $_ with a line of text with the EOL chopped off.
>
> Note that the reverse (adding an EOL on output) is so common that perl6 now
> has 'say', which does this.
>
> Could this behaviour (filtering off and on the EOL) be made a part of the
> standard "read" and "say" functions?

Autochomping is already in the language. It's very underspecified though.

> e) When dealing with files in directories in perl5 under linux, I need
>
> opendir(DIR,'./path/') or die "cant open ./path/\n";
>
> my @filelist = grep { /^.+\.txt/ } readdir(DIR);
>
> I would prefer something like
>
> my Location $dir .= new(:OSpath<'./data'>);
>
> and without any further code $dir contains an Array ($d...@elems) or Hash
> ($dir.%elems) (I dont know which, maybe both?) of File objects. If a Hash,
> then the keys would be the stringified .name attribute of the files.
>
> No need to opendir or readdir. Lazy evaluation could handle most situations,
> but where the Location could be constantly changing its contents, a
> $dir.refresh() method might suffice.

I agree there should be a single function that combines opendir,
readdir and closedir. Scalar readdir can be useful in some context,
but in my experience it's the less common usage of it. From a
programmers point of view lazy operation would be convenient, but from
a resource management point of view that may be a bit complicated.

> f) In general on directories, I am sure a variety of solutions could be
> conceived. It seems to me that abstractly any form of directory could be
> thought of as a Location, which has some path defined for it (though the
> syntax of the path is OS dependent), and which might have children
> locations. At a minimum, a Location would need to provide information about
> whether new resources could be created or accessed (eg., read / write
> permissions).
>
> There are various paradigms for defining how to traverse networks. At some
> point, our language legislators will need to define one for perl6.
>
> If the name of the location node, which can be exposed to the user, eg., by
> printing it or showing it in a GUI to be clicked on, is separated from the
> OS/locale-dependent underlying_name (which may not be easily displayed on a
> standard GUI – suppose it is in ancient Buriyat), then identifying nodes and
> traversing a network of nodes could be made abstract enough to handle all
> sorts of environments.
>
> Perhaps, too a module for a specific environment, eg., Windows, would
> provide the syntatic sugar that makes specifying a location look like
> specifying a directory natively, eg.
> use IO::Windows;
> my Location $x .= new(:OSpath<C:\\Documents\perldata\>);
> whilst for linux it would be
> use IO::Linux;
> my Location $x .=new(:OSpath</home/perldata/>);
>
> This started as short wish list and got far too long. Sorry
>

I think a File::Spec-like approach would be better.



Regards,

Leon Timmermans

Re: Working with files wish list

Reply via email to