[Resending mail; the first copy I sent apparently did not make it to the list]

Hi,

apologies for the long mail. Here is a summary:

We are experiencing overly high memory needs on some of our MediaWiki 
websites. I tried to trace the problem but my practical findings leave me 
somewhat puzzled. In particular, I initially expected Semantic MediaWiki to 
cause some of the problem, but now it seems that the problems are rather 
caused by the fact that SMW can easily generate long outputs, not by SMW's 
memory need for creating these outputs. I am not getting any further with 
debugging the problem, and I do not know how to improve SMW/MW to avoid it.

== How to do memory profiling? ==

I tried to enable PHP memory profiling in xdebug, but all I got was time data, 
and I gave up on this for now. The aggregated outputs in profileinfo.php were 
not very useful for me either; in particular, I think that they do not take 
garbage collection into account, i.e. they only show new memory allocations, 
but not the freeing of old memory. So one piece of code may allocate 20M but 
never need more than 4M at a time, while another consumes the same amount and 
keeps it due to some mem leak. Especially, the sums and percentages do 
apparently not show the real impact that a piece of code has on using PHP's 
memory.

So I based my memory estimations on the minimal PHP memory limit that would 
not return a blank page when creating a page preview 
(ini_set('memory_limit',...);). This measure is rather coarse for debugging, 
but it might be the one number that matters most to the user. The results were 
reproducible.

== Sudden memory explosion and other findings ==

The strangest observation I made on my local machine (Kubuntu Linux, no caches 
whatsoever, MW 1.16alpha r56781, PHP 5.2.6-3ubuntu4.2) was that there tends to 
be a sharp boundary between "no memory problem" and "massive memory 
consumption". Even long pages could be generated with as little as 4M of PHP 
memory. However, if they got just a little too long (one additional line in a 
table), then only a memory limit of 50M or more was sufficient to get a 
result. I disabled extensions for the test. The table I used for testing was 
about 62K with roughly 150 lines, and it used HTML tags, CSS classes, and MW 
links.

The findings really indicate a kind of explosion, not a gradual increase. 
Disabling extensions increased the maximal size of the table by a few rows 
each, but the explosion still happened in the same way. Of course, it is not 
clear how useful the PHP memory limit method of measurement is here.

== What to do about it? ==

How can I fix this? I noticed that it helps to shorten the input (e.g. I can 
render longer tables if the CSS class names in the tables are shorter!), but 
also if I simplify the input (replacing links by plain texts). The table I use 
is based on HTML syntax: I did not try yet if MW's pipe syntax leads to better 
performance. So one option would be to try and simplify SMW's table code 
(maybe making it less readable).

But this would only shift the problem towards longer tables. Caching certainly 
would also reduce memory consumption, but I have observed very high memory 
need on sites even with APC as a PHP bytecode cache (as I said: loading less 
code simply moved the problem by a few lines). I did not try object caching 
(memcached or APC) yet -- is it expected to help here? Squid does not solve 
the problem, since the page still needs to be rendered at some point, and the 
memory limit must be high enough to allow this (and if there is a cache miss 
and rebuilding the cache takes very long, then there can be another cache miss 
while the first request is not done rebuilding the page -- we have had this 
killing one of our servers recently).

Besides this, I would still like to know which measures in SMW or MW could 
help to reduce the problem at its source. Maybe it would help to know which 
aspects of parsing a table are having the highest impact on MW's memory usage. 
Or is this a PHP issue?


Any relevant insights/pointers are welcome,

Markus


-- 
Markus Krötzsch  <[email protected]>
* Personal page: http://korrekt.org
* Semantic MediaWiki: http://semantic-mediawiki.org
* Semantic Web textbook: http://semantic-web-book.org
--


Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to