On 10 Aug 2000 17:02:01 -0000, Perl6 RFC Librarian wrote:
>The input record separator should match one of these (in UTF-8):
>
> 000D 000A
> 000A
> 000D
> 2028
> 2029
Just a technical correction: that is not UTF-8. That is plain and simply
the hex representation of the ord of the 16-bit Unicode character;
UTF-16 if you insist.
>=head1 IMPLEMENTATION
>
>?
Something regexish, I would think. I wouldn't recommend doing this with
Perl's normal regex engine. Instead, a dedicated, simpler, and hopefully
much faster, DFA regex engine (virtually as fast as the current fixed
string search, at least, that's my hope) ought to be employed.
It looks to me like Perl should distinguish between 16-bit Unicode text
files and plain old Ascii-compatible (incl. UTF-8 files) files. I wonder
if the program shouldn't be notified that this is indeed a Unicode file.
--
Bart.