Perl under Unix happily slurps the entire Mac file. I'm playing with a
version that 'buffers' the entire input stream rather than handling
line-by-line in all cases. For extremely large files this may break if
you run out of resources, but it seems pretty fast and means I handle
all cases consistently.
Josh
On Thu, 28 Mar 2002, Wade Johnson wrote:
> I think the 'g' breaks '$' processing because Unix would read the
> whole Mac file in as a single line. (No "\n" characters.) This brings
> up another useful point, if Mac files are read in as a single line,
> then the 'g' option would be _required_ in order to process all lines.
>
> Probably worth a check to see if Perl under Unix would actually read
> a whole Mac file in in one slurp.
>
> The 'g' should not, in general, have any effect on whitespace.
> However, the [$dos]+ would gobble up extra empty lines from a mac
> file, because the whole file is loaded at once.
>
> G. Wade
>
> -----Original Message-----
> From: Joshua Polterock [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, March 28, 2002 10:17 AM
> To: Brad Baxter
> Cc: [EMAIL PROTECTED]
> Subject: Re: unix2dos and dos2unix
>
>
> On Thu, 28 Mar 2002, Brad Baxter wrote:
>
> > Re:
> >
> > > crlf => sub { s/[$dos$mac$unix]+/$myPlatform/og },
> >
> > Three (hair-splitting) questions:
> >
> > 1. Isn't [$dos$mac$unix]+ going to match the same as [$dos]+?
>
> Yes.
>
> > 2. Do you need 'g'?
>
> I think this comes as a matter of taste. I generally like to globally
> replace strings of returns with a single new line. Including the 'g'
> switch compresses extra white space. Dropping the 'g' switch preserves
> extra white space. At least I see this when importing mac2unix.
>
> > 3. Can anyone benchmark whether s/[$dos]+$/$myPlatform/o is faster than
> > s/[$dos]+/$myPlatform/o? It seems like it ought to be, but my efforts
> > haven't proven it.
>
> I do not know if it runs faster, but the added '$' in the regular expression
> breaks the import I describe above when importing mac text with streams of
> returns.
>
> Josh
>
> >
> > Brad
> >
> >
> > On Wed, 27 Mar 2002, Joshua Polterock wrote:
> >
> > >
> > > On Wed, 27 Mar 2002, William R Ward wrote:
> > >
> > > > Programming by committee! This is kind of fun, in a twisted way. OK,
> > > > incorporating John's and Wade's suggestions:
> > > >
> > >
> > > All included, I believe we come to the version below.
> > >
> > > Do the '\x0D' and '\x0A', which appear to work fine on my Solaris 2.7
> box,
> > > make this more portable? Also, do we still have platform-centricity in
> the
> > >
> > > my $unix = "\n";
> > >
> > > statement?
> > >
> > > Josh
> > >
> > > #!/usr/bin/perl -w
> > >
> > > use strict;
> > > use File::Basename;
> > >
> > > my $dos = "\x0D\x0A";
> > > my $mac = "\x0D";
> > > my $unix = "\n";
> > > my $myPlatform = $unix;
> > > my $iam = basename($0);
> > >
> > > $myPlatform = $mac if $^O =~ /mac/i;
> > > $myPlatform = $dos if $^O =~ /win|dos/i;
> > >
> > > my %subs = (
> > > crlf => sub { s/[$dos$mac$unix]+/$myPlatform/og },
> > > dos2unix => sub { s/$dos/$unix/o },
> > > unix2dos => sub { s/$unix/$dos/o },
> > > mac2dos => sub { s/$mac/$dos/o },
> > > mac2unix => sub { s/$mac/$unix/o },
> > > dos2mac => sub { s/$dos/$mac/o },
> > > unix2mac => sub { s/$unix/$mac/o },
> > > );
> > >
> > > die "I do not recognize my own name."
> > > unless exists ($subs{$iam});
> > >
> > > my $sub = $subs{$iam};
> > > while(<STDIN>) {
> > > &$sub;
> > > print;
> > > }
> > >
> > > exit(0);
> > >
> > >
> > >
> >
>