I second the "binmode" suggestion. You can unpack ascii out of a binary file
faster than you can step through a text file and write it.

Scot Robnett
inSite Internet Solutions
[EMAIL PROTECTED]


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of
Jaime Teng
Sent: Thursday, November 29, 2001 4:02 AM
To: [EMAIL PROTECTED];
[EMAIL PROTECTED]
Subject: Re: Writing Large Text Files Quickly


At 08:38 PM 11/19/01 -0000, [EMAIL PROTECTED] wrote:
>Hi All,
>
>Here's an interesting problem which I'd like your help on. Suppose you've
>opened a huge text file (open INPUT, "input.txt). What is the fastest way
to
>"print" this filehandle to another file (open OUTPUT, ">output.txt).
>
>Having discovered that
>       while(<INPUT>) {print OUTPUT $_}
>is significantly slower than
>       print OUTPUT <INPUT>
>I began thinking about what the best way might be. First I played with
>"undef"ing $/ which improved things. Then I tried using "read" rather than
>the implicit "readline" function which <FH> provides. This was even better.
>But I can't help thinking that I should be writing less code, not more to
>make this faster. Surely there is some internal Perl redirection function
>that'll make this run as fast as my hard disk can go?
>

If you really have to read the file, then I suggest two ways:

FIrST WAY
open SFILE, "filename.1";
@DATA = <SFILE>;
close SFILE;
open DFILE, ">filename.2";
print DFILE @DATA;
close DFILE;
this is way faster than reading one line at a time. However,
you should have as much memory to hold the data.

SECOND WAY
open $SFILE, "filename.1";
binmode SFILE;
open DFILE, "filename.2";
binmode $SFILE;
while ($SFILE->read($DATA,131072))
{
  print DFILE $DATA;
}
close DFILE;
close $SFILE;
This is faster than your example on account that this takes lesser loop
and loads up 128K of data to memory in one sweep as against loading a
line of text as you did.






>Your suggestions are most welcome.
>
>Alistair
>
>PS Here's my Benchmark output and test code.
>input.txt is a 88,000 line 9MB test file.
>Tech details NT 4, Pentium II 350MHz, running Activestate Perl 5.6.1 build
>630.
>
>D:\>perl timethese.pl
>Benchmark: timing 5 iterations of List_return, List_undef, Scalar_Read,
>Scalar_return, Scalar_undef...
>List_return: 42 wallclock secs (29.29 usr +  3.10 sys = 32.40 CPU) @
0.15/s
>(n=5)
>List_undef: 17 wallclock secs ( 3.09 usr +  5.67 sys =  8.76 CPU) @  0.57/s
>(n=5)
>Scalar_Read: 12 wallclock secs ( 2.99 usr +  1.76 sys =  4.76 CPU) @
1.05/s
>(n=5)
>Scalar_return: 32 wallclock secs (22.33 usr +  3.10 sys = 25.44 CPU) @
>0.20/s (n=5)
>Scalar_undef: 18 wallclock secs ( 2.90 usr +  4.99 sys =  7.89 CPU) @
>0.63/s (n=5)
>
>use Benchmark;
>timethese(5, {
>       Scalar_return   => '&Scalar_Context',
>       Scalar_undef    => '&Scalar_Context_undef_dollar_slash',
>       List_return     => '&List_Context',
>       List_undef              => '&List_Context_undef_dollar_slash',
>       Scalar_Read             => '&Scalar_Read'
>});
>
>sub Scalar_Context {
>       open I, "input.txt" or die $!;
>       open O, ">output.txt" or die $!;
>       while (<I>) {
>               print O $_;
>       }
>       close O;
>       close I;
>}
>
>sub List_Context {
>       open I, "input.txt" or die $!;
>       open O, ">output.txt" or die $!;
>       print O <I>;
>       close O;
>       close I;
>}
>
>sub List_Context_undef_dollar_slash {
>       my $old_dollar_slash=$/;
>       undef $/;
>       open I, "input.txt" or die $!;
>       open O, ">output.txt" or die $!;
>       print O <I>;
>       close O;
>       close I;
>       $/=$old_dollar_slash;
>}
>
>sub Scalar_Context_undef_dollar_slash {
>       my $old_dollar_slash=$/;
>       undef $/;
>       open I, "input.txt" or die $!;
>       open O, ">output.txt" or die $!;
>       $_=<I>;
>       print O $_;
>       close O;
>       close I;
>       $/=$old_dollar_slash;
>}
>
>
>sub Scalar_Read {
>       open I, "input.txt" or die $!;
>       open O, ">output.txt" or die $!;
>       while (read  I,$_,1024*1024) {
>               print O $_;
>       }
>       close O;
>       close I;
>}
>
>> ----------------------------------------------------------------------
>> Alistair McGlinchy,           [EMAIL PROTECTED]
>> Sizing and Performance, Central IT,   ext. 5012,   ph +44 20 7268-5012
>> Marks and Spencer, 3 Longwalk Rd, Stockley Park, Uxbridge UB11 1AW, UK
>>
>
>
>-----------------------------------------------------------------------
>
>
>Registered Office:
>Marks & Spencer p.l.c
>Michael House, Baker Street,
>London, W1U 8EP
>Registered No. 214436 in England and Wales.
>
>Telephone (020) 7935 4422
>Facsimile (020) 7487 2670
>
>www.marksandspencer.com
>
>Please note that electronic mail may be monitored.
>
>This e-mail is confidential. If you received it by mistake, please let us
know and then delete it from your system; you should not copy, disclose, or
distribute its contents to anyone nor act in reliance on this e-mail, as
this is prohibited and may be unlawful.
>
>The registered office of Marks and Spencer Financial Services Limited,
Marks and Spencer Unit Trust Management Limited, Marks and Spencer Life
Assurance Limited and Marks and Spencer Savings and Investments Limited is
Kings Meadow, Chester, CH99 9FB.
>
>_______________________________________________
>Perl-Win32-Users mailing list
>[EMAIL PROTECTED]
>http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users
>
>
>
_______________________________________________
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users

---
Incoming mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.295 / Virus Database: 159 - Release Date: 11/1/2001

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.295 / Virus Database: 159 - Release Date: 11/1/2001

_______________________________________________
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users

Reply via email to