Re: Re: [PHP] Regular Expression - it works but uses way too much memory ?

2004-06-19 Thread Ulrik S. Kofod

Robin Vickery sagde:

 The S modifier that you're using means that it's storing the studied
 expression. If the regexp changes each time around the loop then over
 3 iterations, that'll add up. See if removing that modifier helps
 at all.

The S modifier wasn't needed, I added it because I thought it would speed it up but
it didn't. Removing it didn't help on the memory usage, but it performs a little
better without.

 If that's not it, then these *might* save you some memory, although
 I've not tested them:

 I'm not entirely sure why you're matching (.*) at the end then putting
 it back in with your replacement text. Without running it, I'd have
 thought that you could leave out the (.*) from your pattern and the $4
 from your replacement and get exactly the same effect.


I tried removing $4 and (.*) but the result isn't the same, actually my first reg.
exp. didn't have $4, but I had to add it. Without it 51 of the 1246 texts isn't
processed right? Also there isn't really any difference in how it performs with or
without it.


 You could use a non-capturing subpattern for $2 which you're not using
 in your replacement.

   $replace = /^((?:[a-z]+?[^a-z]+?){.($count).})(.$typedmask.)/i;

I didn't know you could do that.. cool :), this made the script run a little faster
but it still uses the same amount of memory.


 And maybe a look-behind assertion for the first subpattern rather than
 a capturing pattern then re-inserting $1.

   $replace = /^(?=(?:[a-z]+?[^a-z]+?){.($count).})(.$typedmask.)/i;
   $with = error-start sourcetext=.$corr['sourcetext']. id=.$corr['id'].
   ...


With ?= I get a lot of warnings:

here is an example:
$replace is '/^(?=(?:[a-z]+?[^a-z]+?){50})(go)(.*)/i'
$with is 'error-start sourcetext=3 id=49 group=- class=- corrected-from=go
corrected-to=god$2error-end sourcetext=3 id=49$3'
br /
bWarning/b:  Compilation failed: lookbehind assertion is not fixed length at
offset 34


with the corrections added the reg.exp. looks like this:
$typedmask = preg_replace(/\s+/,.*?,$corr['typed']);

$replace = '/^((?:[a-z]+?[^a-z]+?){'.($count).'})('.$typedmask.')(.*)/i';

$with = '$1error-start sourcetext='.$corr['sourcetext'].' id='.$corr['id'].'
group='.$corr['grupper'].' class='.$corr['ordklasse'].'
corrected-from='.$corr['typed'].'
corrected-to='.$corr['corrected'].'$2error-end
sourcetext='.$corr['sourcetext'].' id='.$corr['id'].'$3';

$text = $skipText[0] . preg_replace ($replace,$with,$text,1);

It completes a little faster and the output is exactly the same as before,
but it still uses way too much memory.

[EMAIL PROTECTED] testextract]# time php ../export.php  export6.txt
real1m15.851s
user0m18.720s
sys 0m1.750s

From top just before the script completed:
  PID USER PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 7843 root  17   0  269M 269M  3328 R41.7 53.6   0:19 php

This isn't a huge problem anymore, as we have been allowed to move the project to a
3 times faster server with less activity (because of this).

But I would still like to know if there is a solution to this because it seems quite
insane that it allocates more than 250MB memory generate 4MB output.

Thanks Robin! I really appreciate your answer.

Brgds Ulrik

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Regular Expression - it works but uses way too much memory ?

2004-06-18 Thread Ulrik S. Kofod
Sorry to post this again but it's a little urgent.

The preg_replace in my script allocates a little memory every time it is called and
doesn't free it again untill the script ends.

I don't know if it is normal behaviour for preg_replace or if it is my reg. exp.
that causes this.

The problem is that I call preg_replace a little more than 3 times and that
causes quite a lot of memory to be allocated.

Doesn't anybody know if this is a problem in preg_replace or is there a better
maillist/forum for this kind of questions?

Any answer will be appreciated :)


Ulrik S. Kofod sagde:

 $replace = /^(([a-z]+?[^a-z]+?){.($count).})(.$typedmask.)(.*)/iS;
 $with = $1error-start sourcetext=.$corr['sourcetext']. id=.$corr['id'].
 group=\.$corr['grupper'].\ class=\.$corr['ordklasse'].\
 corrected-from=\.$corr['typed'].\
 corrected-to=\.$corr['corrected'].\$3error-end
 sourcetext=.$corr['sourcetext']. id=.$corr['id'].$4;
 $text = preg_replace ($replace,$with,$text,1);


 Above preg_replace works as expected, I have it inside a loop that processes 1000
 texts and replaces a total of 3 words.

 The problem is that it accumulates memory up to 200MB, that isn't freed again until
 the script completes?

 It runs for about 1 - 2 minutes and allocates more and more memory.

 If I comment out the preg_replace the memory usage looks normal, so I'm pretty sure
 that is where the problem is?

 My question is basically, is it something in my reg. exp. that it totally nuts or is
 it normal behaviour for preg_replace to allocate memory and not free it again until
 the script ends?

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: Re: [PHP] Regular Expression - it works but uses way too much memory ?

2004-06-18 Thread Robin Vickery
On Fri, 18 Jun 2004 08:57:19 +0200 (CEST), Ulrik S. Kofod
[EMAIL PROTECTED] wrote:
 
 Sorry to post this again but it's a little urgent.
 
 The preg_replace in my script allocates a little memory every time it is called and
 doesn't free it again untill the script ends.
 
 I don't know if it is normal behaviour for preg_replace or if it is my reg. exp.
 that causes this.
 
 The problem is that I call preg_replace a little more than 3 times and that
 causes quite a lot of memory to be allocated.
 
  $replace = /^(([a-z]+?[^a-z]+?){.($count).})(.$typedmask.)(.*)/iS;
  $with = $1error-start sourcetext=.$corr['sourcetext']. id=.$corr['id'].
  group=\.$corr['grupper'].\ class=\.$corr['ordklasse'].\
  corrected-from=\.$corr['typed'].\
  corrected-to=\.$corr['corrected'].\$3error-end
  sourcetext=.$corr['sourcetext']. id=.$corr['id'].$4;
  $text = preg_replace ($replace,$with,$text,1);
 
  The problem is that it accumulates memory up to 200MB, that isn't freed again until
  the script completes?

The S modifier that you're using means that it's storing the studied
expression. If the regexp changes each time around the loop then over
3 iterations, that'll add up. See if removing that modifier helps
at all.

  $replace = /^(([a-z]+?[^a-z]+?){.($count).})(.$typedmask.)(.*)/i;

If that's not it, then these *might* save you some memory, although
I've not tested them:

I'm not entirely sure why you're matching (.*) at the end then putting
it back in with your replacement text. Without running it, I'd have
thought that you could leave out the (.*) from your pattern and the $4
from your replacement and get exactly the same effect.

  $replace = /^(([a-z]+?[^a-z]+?){.($count).})(.$typedmask.)/i;
  ...
  sourcetext=.$corr['sourcetext']. id=.$corr['id'].;

You could use a non-capturing subpattern for $2 which you're not using
in your replacement.

  $replace = /^((?:[a-z]+?[^a-z]+?){.($count).})(.$typedmask.)/i;

And maybe a look-behind assertion for the first subpattern rather than
a capturing pattern then re-inserting $1.

  $replace = /^(?=(?:[a-z]+?[^a-z]+?){.($count).})(.$typedmask.)/i;
  $with = error-start sourcetext=.$corr['sourcetext']. id=.$corr['id'].
  ...

   -robin

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php