The files I am dealing with can be upwards of 42,000 MB and larger. I tried fancier ways of processing, with arrays and such, and blew out my memory every time.

I finally had to re-write my entire script to process line by line, holding almost nothing in memory. It is not pretty and not fancy, but it appears to be the only viable option.

Steven Manross wrote:
Can anyone seem to think of a less memory intensive way of doing this
(besides chopping the file up into smaller chunks -- which is my next
step unless I receive a better option?

The txt file is 335MB.

The column and row delimiter can change, but needs to be an "odd
mutli-char delimiter" due to the nature of the what the fields can
contain (just about anything)..

Saturday morning I started this and it's consumed 591MB of memory so far
and is still split-ting the file into @text_array (my box has 512MB RAM
and a 1GB page file -- bumped it up prior to starting this script).

Needless to say, the script is paging (as I knew it would), and I was
hoping that it would be done after the long weekend, but it isn't...  :(

Any pointers?

Code follows.........

print "script started\n";
open (FILE,"C:\\thisfile.txt") || die "File open error\n";
print "file opened\n";
@text_array = split(/\|\|\~\|\|\~\|\|/,<FILE>);
print "array parsed (count = ".scalar(@text_array).")\n";

close (FILE);

print "file closed\n";
foreach my $line (@text_array) {
  $x++;
  $y++;
  if ($x> 1000) {
    print ".";
    $x=0;
  }
  my @columns = split(/\|\|\~\|/,$line);
  foreach $col (@columns) {
    my $col_len = length($col);
    for my $pos (0..($col_len - 1)) {
      my $char = substr($col,$pos,1);
      my $ord_char = ord($char);
      $hash{$ord_char} = $hash{$ord_char} + 1;
    }
  }
}

foreach $ord_char (sort {$a <=> $b} keys %hash) {
  print $ord_char." (".chr($ord_char).") - ".$hash{$ord_char}."\n";
}

_______________________________________________
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


--



Craig Cardimon, Programmer
AUS Inc.
(Knowledge Express Data Systems; 1-800-529-5337, ext. 24)
_______________________________________________
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to