Looks to me like you're reading the whole giant file into memory - as
well as doing a regex on the whole file.
how about reading the file in smaller chunks and doing the regext on a
per chunk basis.
Also, do you really want to put all the result in to an array? that
variable could get large.
how about just echoing so it can be redirected to a file.


//this code assumes you won't want matches > 200 chars long.
  $handle = fopen($filename, "r");
  while($chunk= fread($handle, 100)) {

     //check for this 100 chars, along with the previous, in case
match goes across the boundary.
     preg_match_all($pattern, $last.$chunk, $matches);
     $last = $chunk;

     foreach($matches as $m) echo "$m\n";
  }
  fclose($handle);


On Tue, Jan 6, 2009 at 1:55 PM, Michael <[email protected]> wrote:
>
> Part 3 - In reference to the below code - I have replaced the fopen(), fread()
> and fclose() with file_get_contents()
>
> This does not result in any noticeable speed improvement.
>
> What does however, is taking out the array_merge() statement. To say this
> makes it a lot quicker, is a an understatement - it's a hell of a difference.
>
> Also the speed from start to finish is now linear. This would explain the
> previous noted slow down - the further it gets through (and the bigger the
> overall array gets) the lot slower it gets to merge data in, ven though it's
> only merging small quantities in per time. (average - 5 to 20 records)
>
> preg_match_all() is run once per file.
>
> The output from preg_match_all is an array - afaik there is no other option
> here.
>
> The question is what can I then do with the array output from preg_match_all()
> to store it, along with the combined data from the previous files scanned,
> that does not involve CPU intensive and increasingly slow (the more files
> processed) calls to array_merge() as it gets through the job?
>
> FYI - each file is text, usually under 20kb, and has 5-20 regex 'matches' on
> average.
>
> -----------------------------------------
> Ok. It is probably about time I posted the awlfully slow section of code for
> people to look at and state their opinions-
>
> (Prior to this part the code has recursively scanned a directory and created
> an array with all the path / file names).
>
> foreach ($files as $filename) {
>
> // Determine MIME type: (This uses pear mime_type module)
> $mt=MIME_Type::autoDetect("$filename");
>
> // If suitable MIME type open, read and process: (ie: all text files)
> if (substr($mt,0,4) == "text") {
>
> // obtain filesize needed for fread():
> // $fs = filesize("$filename"); COMMENTED OUT NOW
>
> // open the file:
> // $thefile = fopen("$filename","rb"); COMMENTED OUT NOW
>
> // read the file:
> // $content = fread($thefile,$fs); COMMENTED OUT NOW and replaced
> // with file_get_contents()
>
> // extract the information: ($pattern is a previously defined PCRE regex)
> preg_match_all($pattern,$content,$matches);
>
> // add it to our array:
> $results = array_merge($matches[0],$results);
>
> // unset the temporary array:
> unset($matches);
>
> // close the file:
> // fclose($thefile); COMMENTED OUT NOW
>
> // count valid file(s) scanned:
> $pv = $pv + 1;
>
> }; // end the MIME type statement
>
> // count files(s) scanned:
> $p1 = $p1 + 1;
> $bar1->update($p1);
>
> // count data scanned:
> $fc = $fc + $fs;
>
> }; // end loop:
>
> >
>



-- 
No Guilt apon accusation!
http://creativefreedom.org.nz

--~--~---------~--~----~------------~-------~--~----~
NZ PHP Users Group: http://groups.google.com/group/nzphpug
To post, send email to [email protected]
To unsubscribe, send email to
[email protected]
-~----------~----~----~----~------~----~------~--~---

Reply via email to