Re: Re: [PHP] Regular Expression - it works but uses way too much memory ?
Robin Vickery sagde: The S modifier that you're using means that it's storing the studied expression. If the regexp changes each time around the loop then over 3 iterations, that'll add up. See if removing that modifier helps at all. The S modifier wasn't needed, I added it because I thought it would speed it up but it didn't. Removing it didn't help on the memory usage, but it performs a little better without. If that's not it, then these *might* save you some memory, although I've not tested them: I'm not entirely sure why you're matching (.*) at the end then putting it back in with your replacement text. Without running it, I'd have thought that you could leave out the (.*) from your pattern and the $4 from your replacement and get exactly the same effect. I tried removing $4 and (.*) but the result isn't the same, actually my first reg. exp. didn't have $4, but I had to add it. Without it 51 of the 1246 texts isn't processed right? Also there isn't really any difference in how it performs with or without it. You could use a non-capturing subpattern for $2 which you're not using in your replacement. $replace = /^((?:[a-z]+?[^a-z]+?){.($count).})(.$typedmask.)/i; I didn't know you could do that.. cool :), this made the script run a little faster but it still uses the same amount of memory. And maybe a look-behind assertion for the first subpattern rather than a capturing pattern then re-inserting $1. $replace = /^(?=(?:[a-z]+?[^a-z]+?){.($count).})(.$typedmask.)/i; $with = error-start sourcetext=.$corr['sourcetext']. id=.$corr['id']. ... With ?= I get a lot of warnings: here is an example: $replace is '/^(?=(?:[a-z]+?[^a-z]+?){50})(go)(.*)/i' $with is 'error-start sourcetext=3 id=49 group=- class=- corrected-from=go corrected-to=god$2error-end sourcetext=3 id=49$3' br / bWarning/b: Compilation failed: lookbehind assertion is not fixed length at offset 34 with the corrections added the reg.exp. looks like this: $typedmask = preg_replace(/\s+/,.*?,$corr['typed']); $replace = '/^((?:[a-z]+?[^a-z]+?){'.($count).'})('.$typedmask.')(.*)/i'; $with = '$1error-start sourcetext='.$corr['sourcetext'].' id='.$corr['id'].' group='.$corr['grupper'].' class='.$corr['ordklasse'].' corrected-from='.$corr['typed'].' corrected-to='.$corr['corrected'].'$2error-end sourcetext='.$corr['sourcetext'].' id='.$corr['id'].'$3'; $text = $skipText[0] . preg_replace ($replace,$with,$text,1); It completes a little faster and the output is exactly the same as before, but it still uses way too much memory. [EMAIL PROTECTED] testextract]# time php ../export.php export6.txt real1m15.851s user0m18.720s sys 0m1.750s From top just before the script completed: PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 7843 root 17 0 269M 269M 3328 R41.7 53.6 0:19 php This isn't a huge problem anymore, as we have been allowed to move the project to a 3 times faster server with less activity (because of this). But I would still like to know if there is a solution to this because it seems quite insane that it allocates more than 250MB memory generate 4MB output. Thanks Robin! I really appreciate your answer. Brgds Ulrik -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Regular Expression - it works but uses way too much memory ?
Sorry to post this again but it's a little urgent. The preg_replace in my script allocates a little memory every time it is called and doesn't free it again untill the script ends. I don't know if it is normal behaviour for preg_replace or if it is my reg. exp. that causes this. The problem is that I call preg_replace a little more than 3 times and that causes quite a lot of memory to be allocated. Doesn't anybody know if this is a problem in preg_replace or is there a better maillist/forum for this kind of questions? Any answer will be appreciated :) Ulrik S. Kofod sagde: $replace = /^(([a-z]+?[^a-z]+?){.($count).})(.$typedmask.)(.*)/iS; $with = $1error-start sourcetext=.$corr['sourcetext']. id=.$corr['id']. group=\.$corr['grupper'].\ class=\.$corr['ordklasse'].\ corrected-from=\.$corr['typed'].\ corrected-to=\.$corr['corrected'].\$3error-end sourcetext=.$corr['sourcetext']. id=.$corr['id'].$4; $text = preg_replace ($replace,$with,$text,1); Above preg_replace works as expected, I have it inside a loop that processes 1000 texts and replaces a total of 3 words. The problem is that it accumulates memory up to 200MB, that isn't freed again until the script completes? It runs for about 1 - 2 minutes and allocates more and more memory. If I comment out the preg_replace the memory usage looks normal, so I'm pretty sure that is where the problem is? My question is basically, is it something in my reg. exp. that it totally nuts or is it normal behaviour for preg_replace to allocate memory and not free it again until the script ends? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: Re: [PHP] Regular Expression - it works but uses way too much memory ?
On Fri, 18 Jun 2004 08:57:19 +0200 (CEST), Ulrik S. Kofod [EMAIL PROTECTED] wrote: Sorry to post this again but it's a little urgent. The preg_replace in my script allocates a little memory every time it is called and doesn't free it again untill the script ends. I don't know if it is normal behaviour for preg_replace or if it is my reg. exp. that causes this. The problem is that I call preg_replace a little more than 3 times and that causes quite a lot of memory to be allocated. $replace = /^(([a-z]+?[^a-z]+?){.($count).})(.$typedmask.)(.*)/iS; $with = $1error-start sourcetext=.$corr['sourcetext']. id=.$corr['id']. group=\.$corr['grupper'].\ class=\.$corr['ordklasse'].\ corrected-from=\.$corr['typed'].\ corrected-to=\.$corr['corrected'].\$3error-end sourcetext=.$corr['sourcetext']. id=.$corr['id'].$4; $text = preg_replace ($replace,$with,$text,1); The problem is that it accumulates memory up to 200MB, that isn't freed again until the script completes? The S modifier that you're using means that it's storing the studied expression. If the regexp changes each time around the loop then over 3 iterations, that'll add up. See if removing that modifier helps at all. $replace = /^(([a-z]+?[^a-z]+?){.($count).})(.$typedmask.)(.*)/i; If that's not it, then these *might* save you some memory, although I've not tested them: I'm not entirely sure why you're matching (.*) at the end then putting it back in with your replacement text. Without running it, I'd have thought that you could leave out the (.*) from your pattern and the $4 from your replacement and get exactly the same effect. $replace = /^(([a-z]+?[^a-z]+?){.($count).})(.$typedmask.)/i; ... sourcetext=.$corr['sourcetext']. id=.$corr['id'].; You could use a non-capturing subpattern for $2 which you're not using in your replacement. $replace = /^((?:[a-z]+?[^a-z]+?){.($count).})(.$typedmask.)/i; And maybe a look-behind assertion for the first subpattern rather than a capturing pattern then re-inserting $1. $replace = /^(?=(?:[a-z]+?[^a-z]+?){.($count).})(.$typedmask.)/i; $with = error-start sourcetext=.$corr['sourcetext']. id=.$corr['id']. ... -robin -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php