A long while ago I remember looking at the parser and realizing that
the recursive template expansion and argument handling led the parser
to run all branches of #if and #switch statements before deciding
which one to include.

In other words, given {{#if: something | statements_A | statements_B
}}, the parser was fully expanding both statements_A and statements_B
before checking #if to decide which one to keep.  Obviously that is
inefficient and in the case of very complicated conditional templates
potentially very expensive.

The parser has changed so much since I last worked with it that I am
having difficulty figuring out if this is still true.  Hopefully,
someone already went through and improved the branch handling logic,
but if not, I would suggest that this would also be a good generalized
target for improving template operation.

-Robert Rohde


On Sat, Jan 31, 2009 at 5:03 AM, Domas Mituzas <[email protected]> wrote:
> Hello,
>
> I understand the need for cite, thats why it is still there :) But...
>
> - We format Cite references list every 100th request to backend,
> though it takes 8.15% backend response time (thanks parser cache,
> without it Cite formatting would take 815% cluster time - though
> developers should understand I'm not exactly right at this hyperbole ;-)
>
> - When parsing articles like one of most popular today,
> [[en:Rod_Blagojevich_corruption_charges]], it takes 20s to produce the
> page, 17s is spent on Cite block, executing {{cite}} mostly. That
> makes every editor wait for ages to get a page displayed, and due to
> cache stampede after invalidation it causes considerable stress on
> site (look at numbers mentioned above).
>
> - This 8% is in real-time, which includes waiting for search,
> databases, and simply CPU contention, which we end up having today.
> CPU-time wise it is way higher, so can actually have 20% CPU time
> impact on our application farm. Thats at least 100k$ worth of hardware
> (and rising), even if new/modern one, just for citation formatting.
>
> So, a checklist what can be done ( simple to complex )
>
> [  ] - Simplification of {{cite}}
> [  ] - Separate cache for Cite, to avoid reparsing on minor edits,
> that don't involve citations. I have no idea how much this would win,
> but there is theoretical chance of stripping 1% or so. ;)
> [  ] - Offload some templates like {{cite}} to actual PHP extensions
> (can of worms, but, oh well, can be standardized process too)
> [  ] - Implement proper scripting engine like Lua for metatemplates 
> (http://pecl.php.net/package/lua
>  - another can of worms, though yet again, can be managed via trusted
> set of people, on top20 wikis or so).
> [  ] - Frustrated operations guy adding something like ( return ""; )
> in some random extension, and syncing the live hack. Obviously there
> would be some "HAHA YOU THOUGHT I COULDN'T DO THIS" comments in there.
>
> I for one can directly participate in at least two of these options. ;-)
>
> Unfortunately, {{cite}} is the only template I can profile/account for
> now, we don't have proper per-template profiling, but I wish to get
> one some day. Then we'd have more "war on ..." topics ;-D
>
> Generally, templates are major part of our parsing, and thats over 50%
> of our current cluster CPU load.
> As we've actually managed to hit 100% last week, something what hasn't
> happened for a while, some of work has to be done here.
>
> Of course, new hardware will help for a while, but I for one have huge
> personal satisfaction saving donation money. ;-)
>
> CHEERS!
> --
> Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]
>
>
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to