On Thu, 10 Aug 2000, Perl6 RFC Librarian wrote:
> Given this input file:
>
> D O S CR LF 0044 004F 0053 000D 000A
> U n i x LF 0055 006E 0069 0078 000A
> M a c CR 004D 0061 0063 000D
> l i n e LS 006C 0069 006E 0065 2028
> p a r a PS 0070 0061 0072 0061 2029
> l i n e 006C 0069 006E 0065
>
> This should work as expected on as many platforms as possible:
>
> my @lines = <FH>;
>
> The @lines array should contain six elements.
Well, if _I_ have an input file like the above, it most likely isn't
text.
>
> Bart Lateur has suggested differentiating between ASCII-compatible
> and UTF-16. Perhaps a flag?
Yes, that's how things are currently implemented in Perl 5, I believe
Nick said. (Well, not like you have impemented. Internally, per
string.)
open FOO... # syntax to follow, but assume an ASCII file
@foo = <FOO>; # Each string in @foo is flagged ASCII.
open BAR... # assume utf-16
@bar = <BAR>; # Each string in @bar is flagged utf-16.
@baz = map { $foo[$_] . $bar[$_] } ( 0 .. 10 );
# the first eleven lines of @foo are "promoted" to utf-16,
# the concatenation done, and stored to @baz, which is
# flagged utf-16.
open >OUT_ASCII... # Open for writing ascii;
open >OUT_UTF32... # open for writing utf-32;
print OUT_ASCII @baz; # Either an error, or data truncation. :(
print OUT_UTF32 @baz; # promotes all strings to utf-32 and writes them
>
> The binmode function should treat data as binary and not translate
> line disciplines. (No one objects to this so far?)
binmode should be a line discipline itself.
>
> Whether $/ will remain in Perl 6 is uncertain, so this is not
> necessarily about $/.
Agreed.
>
> Bart Lateur suggested using a dedicated DFA regex engine.
Which was a good suggestion. My impression of line-disciplines as how
they would/could be used is to handle the splitting of an input string,
much as split would. split takes a regex, why not the $/ equivalent?
(Other than not knowing what to put back in a -l type context.)
Two major questions from me, I guess.
Aren't line disciplines mainly going to be emulated? IOW, would Perl
"line disciplines" necessary map 1-1 and onto sfio line disciplines?
I think of line disciplines as hints to open, or <>, actually, for how
to process the data. From this perspective, they could follow the
standard line discipline syntax of :foo, or use something as simple as
a Perl hash, and would not necessarily be limited to the open call.
(This is all a wag, don't take it as gospel.)
So you'd potentially have hints like so:
:bin
:text
:text=ascii
:text=utf8
:text=utf32
:text=ebcdic
:text=some_weird_proprietary_format
:block=<size>
:line=
:line=dos
:line=mac
:line=unix
:line=/like, you know?/
:mod=chomp
:mod=split
:mod=\&do_some_funky_pre_processing
That would allow you to do stuff like the following.
open FOO, "foo_file", :text=ascii, :line=unix, :mod=chomp;
while (<FOO>)
{
# $_ = an ascii line, delimited to \n, but without the \n;
}
open FOO, "foo_file", :bin, :block=1024;
while (<FOO>)
{
# Now $_ is a block of 1024. Easy read. (I know,
# I know, what about eof()?
}
open FOO, "foo_file", :text, :line;
# Sample the input stream and make a best guess
$text_mode = taste FOO, "text";
$line_mode = taste FOO, "line";
spank FOO, text => $text_mode, line => $line_mode;
while (<FOO>)
{
# Reads with the right disciplines now
}
open FOO, "frozen_foo", :bin;
spank FOO, block => 48, mod => \&thaw_struct;
while (<FOO>)
{
# Do something with the object that is $_
}
--
Bryan C. Warnock
([EMAIL PROTECTED])