Re: RFC 69 (v3) Standardize input record separator (for

Bryan C . Warnock Thu, 10 Aug 2000 18:31:04 -0700
On Thu, 10 Aug 2000, Perl6 RFC Librarian wrote:

> Given this input file:
> 
>     D O S CR LF    0044 004F 0053 000D 000A
>     U n i x  LF    0055 006E 0069 0078 000A
>     M a c CR       004D 0061 0063 000D
>     l i n e  LS    006C 0069 006E 0065 2028
>     p a r a  PS    0070 0061 0072 0061 2029
>     l i n e        006C 0069 006E 0065
> 
> This should work as expected on as many platforms as possible:
> 
>     my @lines = <FH>;
> 
> The @lines array should contain six elements.

Well, if _I_ have an input file like the above, it most likely isn't
text.  

> 
> Bart Lateur has suggested differentiating between ASCII-compatible
> and UTF-16.  Perhaps a flag?

Yes, that's how things are currently implemented in Perl 5, I believe
Nick said.  (Well, not like you have impemented.  Internally, per
string.)

open FOO... # syntax to follow, but assume an ASCII file
@foo = <FOO>;  # Each string in @foo is flagged ASCII.

open BAR... # assume utf-16
@bar = <BAR>; # Each string in @bar is flagged utf-16.

@baz = map { $foo[$_] . $bar[$_] } ( 0 .. 10 );

# the first eleven lines of @foo are "promoted" to utf-16, 
# the concatenation done, and stored to @baz, which is
# flagged utf-16.

open >OUT_ASCII... # Open for writing ascii;
open >OUT_UTF32... # open for writing utf-32;

print OUT_ASCII @baz; # Either an error, or data truncation.  :(
print OUT_UTF32 @baz;  # promotes all strings to utf-32 and writes them

> 
> The binmode function should treat data as binary and not translate
> line disciplines.  (No one objects to this so far?)

binmode should be a line discipline itself.

> 
> Whether $/ will remain in Perl 6 is uncertain, so this is not
> necessarily about $/.

Agreed.

> 
> Bart Lateur suggested using a dedicated DFA regex engine.

Which was a good suggestion.  My impression of line-disciplines as how
they would/could be used is to handle the splitting of an input string,
much as split would.  split takes a regex, why not the $/ equivalent?
(Other than not knowing what to put back in a -l type context.)

Two major questions from me, I guess.

Aren't line disciplines mainly going to be emulated?  IOW, would Perl
"line disciplines" necessary map 1-1 and onto sfio line disciplines?
I think of line disciplines as hints to open, or <>, actually, for how
to process the data.  From this perspective, they could follow the
standard line discipline syntax of :foo, or use something as simple as
a Perl hash, and would not necessarily be limited to the open call.
(This is all a wag, don't take it as gospel.)

So you'd potentially have hints like so:

:bin
:text
:text=ascii
:text=utf8
:text=utf32
:text=ebcdic
:text=some_weird_proprietary_format
:block=<size>
:line=
:line=dos
:line=mac
:line=unix
:line=/like, you know?/
:mod=chomp
:mod=split
:mod=\&do_some_funky_pre_processing

That would allow you to do stuff like the following.

open FOO, "foo_file", :text=ascii, :line=unix, :mod=chomp;
while (<FOO>)
{
        # $_ = an ascii line, delimited to \n, but without the \n;
}

open FOO, "foo_file", :bin, :block=1024;
while (<FOO>)
{
        # Now $_ is a block of 1024.  Easy read.  (I know,
        # I know, what about eof()?
}

open FOO, "foo_file", :text, :line;
# Sample the input stream and make a best guess
$text_mode = taste FOO, "text";  
$line_mode = taste FOO, "line";
spank FOO, text => $text_mode, line => $line_mode;
while (<FOO>)
{
        # Reads with the right disciplines now
}


open FOO, "frozen_foo", :bin; 
spank FOO, block => 48, mod => \&thaw_struct;
while (<FOO>)
{
        # Do something with the object that is $_
}



 -- 
Bryan C. Warnock
([EMAIL PROTECTED])
Re: RFC 69 (v3) Standardize input record separator (for

Reply via email to