Re: [Wikitech-l] Enabling some string functions
On Thu, Jun 25, 2009 at 10:35 PM, Tim Starlingtstarl...@wikimedia.org wrote: snip The community of people who work on such templates is an extremely small, self-selected subset of the community of editors. It is that tiny segment of the community that can code in this accidental programming language, who are not deterred by its density, inconsistency or performance limitations. There is some truth to this. However, I believe the community of people who would like to see string functions is much, much larger, than just the community of template coders. Most Wikipedians can use templates even if they don't feel comfortable creating them, and many of them have at one time or another encountered practical problems that could be solved with basic string functionality. snip Introducing a scripting language will not make those accumulated contributions disappear. The task of deciphering them, and converting them to a more accessible form, will remain. Do you actually have a plan for introducing a scripting language? Lua, which seems to your favored strategy, was recently LATER-ed on bugzilla by Brion, and suffers from several serious problems. For example the dependency on compiled binaries is highly undesirable. The relative power of a full programming language would require limiting its resources to avoid bad code consuming all memory or flooding Mediawiki with output, and that is only the starting point for considering the risks of malicious or overtaxing code. Not to mention that the comments at Extension talk:Lua suggest several people have failed in attempts to get the Extension working at all. Even if one gets past that, Lua brings its own grammar, set of function keywords, and methodologies, which will again create a high barrier to participation for people wanting to work with it. Frankly Lua feels like it creates at least as many usability and portability problems as it solves, and is still a long ways off. Werdna's suggestion to adapt the AbuseFilter parser into a home-grown Mediawiki scripting language feels lot more natural in terms of control and ability to affect an integrated presentation, but that would also seem quite distant. If one is going to say no string functions until the template coding problem is solved, then I'd liked to know if there is really a serious strategy for doing that. -Robert Rohde ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Current events-related overloads
2009/6/26 Brion Vibber br...@wikimedia.org: Tim Starling wrote: It's quite a complex feature. If you have a server that deadlocks or is otherwise extremely slow, then it will block rendering for all other attempts, meaning that the article can not be viewed at all. That scenario could even lead to site-wide downtime, since threads waiting for the locks could consume all available apache threads, or all available DB connections. It's a reasonable idea, but implementing it would require a careful design, and possibly some other concepts like per-article thread count limits. *nod* We should definitely ponder the issue since it comes up intermittently but regularly with big news events like this. At the least if we can have some automatic threshold that temporarily disables or reduces hits on stampeded pages that'd be spiffy... Of course, the fact that everyone's first port of call after hearing such news is to check the Wikipedia page is a fantastic thing, so it would be really unfortunate if we have to stop people doing that. Would it be possible, perhaps, to direct all requests for a certain page through one server so the rest can continue to serve the rest of the site unaffected? Or perhaps excessively popular pages could be rendered (for anons) as part of the editing process, rather than the viewing process, since that would mean each version of the article is rendered only once (for anons) and would just slow down editing slightly (presumably by a fraction of a second), which we can live with. There must be something we can do that allows people to continue viewing the page wherever possible. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Enabling some string functions
2009/6/26 Stephen Bain stephen.b...@gmail.com: In the good old days someone would have solved the same problem by mentioning in the template's documentation that the parameter should use full URLs. Both the template and instances of it would be readable. Template programmers are not going to create accessible templates because they have a programming mindset, and set out to solve problems in ways like Brian's code above. Maybe it's the mindset that should be changed then? For one thing, {{link}} used to use {{substr}} to check if the first argument started with http:// , https:// or ftp:// and produced an internal link if not, despite the fact that the documentation for {{link}} clearly states that it creates an *external* link, which means people shouldn't be using it to create internal links. If people try to use a template for something it's not intended for, they should be told to use a different template; currently, it seems like the template is just extended with new functionality, leading unnecessary {{#if: , {{#switch: and {{substr}} uses that serve only the users' laziness. To get back to {{cite}}: the template itself contains no more than some logic to choose between {{Citation/core}} and {{Citation/patent}} based on the presence/absence of certain parameters, and {{Citation/core}} does the same thing to choose between books and periodicals. What's wrong with breaking up this template in, say, {{cite patent}}, {{cite book}} and {{cite periodical}}? Similarly, other multifunctional templates could be broken up as well. The reason I believe breaking up templates improves performance is this: they're typically of the form {{#if:{{{someparam|}}}|{{foo}}|{{bar . The preprocessor will see that this is a parser function call with three arguments, and expand all three of them before it runs the #if hook. This means both {{foo}} and {{bar}} get expanded, one of which in vain. Of course this is even worse for complex systems of nested #if/#ifeq statements and/or #switch statements, in which every possible 'code' path is evaluated before a decision is made. In practice, this means that for every call to {{cite}}, which seems to have three major modes, the preprocessor will spend about 2/3 of its time expanding stuff it's gonna throw away anyway. To fix this, control flow parser functions such as #if could be put in a special class of parser functions that take their arguments unexpanded. They could then call the parser to expand their first argument and return a value based on that. Whether these functions are expected to return expanded or unexpanded wikitext doesn't really matter from a performance standpoint. (Disclaimer: I'm hardly a parser expert, Tim is; he should of course be the judge of the feasibility of this proposal.) As an aside, lazy evaluation of #if statements would also improve performance for stuff like: {{#if:{{{param1|}}}|Do something with param1 {{#if:{{{param2|}}}|Do something with param2 ... {{#if:{{{param9|}}}|Do something with param9}} Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Enabling some string functions
Roan Kattouw wrote: To get back to {{cite}}: the template itself contains no more than some logic to choose between {{Citation/core}} and {{Citation/patent}} based on the presence/absence of certain parameters, and {{Citation/core}} does the same thing to choose between books and periodicals. What's wrong with breaking up this template in, say, {{cite patent}}, {{cite book}} and {{cite periodical}}? Similarly, other multifunctional templates could be broken up as well. While this is not a comment on merits of string functions in general, there are following wrong things with that approach: - It is easier for users to remember the name of just a single template. - Multiple templates that are separately maintained will diverge over time, for example same parameters might end being named differently. - A new feature in one template can't be easily applied to another template. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Extending wikilinks syntax
On Fri, Jun 26, 2009 at 12:07 PM, Aryeh Gregorsimetrical+wikil...@gmail.com wrote: From the editor's point of view. Not from the view of the HTML source, which is what the original proposal was looking at. I guess. I'm starting to get the initial pangs of an idea that we should have different kinds of syntax: 1) Article pages should only be allowed simplified syntax: no parser functions, nothing funky at all. You want to use weird features, you must wrap it in a template 2) Normal templates can use the full range of existing syntax 3) A limited number of admin-controlled special templates can use an even wider range of features, including raw HTML. Then, if you really specific HTML for a very specific, widely used template, you could, without opening up any cans of worms. [The benefit from 1) above is less unreadable wikitext in article space, though I suspect that's fairly limited already, and unreadable wikitext is mostly from refs and massive templates like {{cite}} ] Steve ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] PHP 5.3.0 coming soon!
On Fri, Jun 26, 2009 at 6:24 AM, Andrew Garrettagarr...@wikimedia.org wrote: Hooray for closures! Do we have plans to update the cluster? Does it matter if MediaWiki still has to work on PHP 5.0? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Current events-related overloads
On Fri, Jun 26, 2009 at 6:33 AM, Thomas Daltonthomas.dal...@gmail.com wrote: Of course, the fact that everyone's first port of call after hearing such news is to check the Wikipedia page is a fantastic thing, so it would be really unfortunate if we have to stop people doing that. He didn't say we'd shut down views for the article, just we'd shut down reparsing or cache invalidation or something. This is the live hack that was applied yesterday: Index: includes/parser/ParserCache.php === --- includes/parser/ParserCache.php (revision 52359) +++ includes/parser/ParserCache.php (working copy) -63,6 +63,7 @@ if ( is_object( $value ) ) { wfDebug( Found.\n ); # Delete if article has changed since the cache was made + if( $article-mTitle-getPrefixedText() != 'Michael Jackson' ) { // temp hack! $canCache = $article-checkTouched(); $cacheTime = $value-getCacheTime(); $touched = $article-mTouched; -82,6 +83,7 @@ } wfIncrStats( pcache_hit ); } + }// temp hack! } else { wfDebug( Parser cache miss.\n ); wfIncrStats( pcache_miss_absent ); It just meant that people were seeing outdated versions of the article. Would it be possible, perhaps, to direct all requests for a certain page through one server so the rest can continue to serve the rest of the site unaffected? Every page view involves a number of servers, and they're not all interchangeable, so this doesn't make a lot of sense. Or perhaps excessively popular pages could be rendered (for anons) as part of the editing process, rather than the viewing process, since that would mean each version of the article is rendered only once (for anons) and would just slow down editing slightly (presumably by a fraction of a second), which we can live with. You think that parsing a large page takes a fraction of a second? Try twenty or thirty seconds. But this sounds like a good idea. If a process is already parsing the page, why don't we just have other processes display an old cached version of the page instead of waiting or trying to reparse themselves? The worst that would happen is some users would get old views for a couple of minutes. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] PHP 5.3.0 coming soon!
On Fri, Jun 26, 2009 at 9:48 AM, Aryeh Gregorsimetrical+wikil...@gmail.com wrote: On Fri, Jun 26, 2009 at 6:24 AM, Andrew Garrettagarr...@wikimedia.org wrote: Hooray for closures! Do we have plans to update the cluster? Does it matter if MediaWiki still has to work on PHP 5.0? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l I could be completely off here, but I thought the lowest supported release was 5.1.x. Or that there was talk (somewhere?) of making that the case. -Chad ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Enabling some string functions
On Thu, Jun 25, 2009 at 11:33 PM, Tim Starlingtstarl...@wikimedia.org wrote: Those templates can be defeated by reducing the functionality of padleft/padright, and I think that would be a better course of action than enabling the string functions. The set of string functions you describe are not the most innocuous ones, they're the ones I most want to keep out of Wikipedia, at least until we have a decent server-side scripting language in parallel. Well, then at least let's be consistent and cripple padleft/padright. Also, while I disagree with Robert's skepticism about the comparative usability of a real scripting language, I'd be interested to hear what your ideas are for actually implementing that. Come to think of it, the easiest scripting language to implement would be . . . PHP! Just run it through the built-in PHP parser, carefully sanitize the tokens so that it's safe (possibly banning things like function definitions), and eval()! We could even dump the scripts into lots of little files and use includes, so APC can cache them. That would probably be the easiest thing to do, if we need to keep pure PHP support for the sake of third parties. It's kind of horrible, of course . . . How much of Wikipedia is your random shared-hosted site going to be able to mirror anyway, though? Couldn't we at least require working exec() to get infoboxes to work? People on shared hosting could use Special:ExpandTemplates to get a copy of the article with no dependencies, too (albeit with rather messy source code). On Fri, Jun 26, 2009 at 6:33 AM, Roan Kattouwroan.katt...@gmail.com wrote: The reason I believe breaking up templates improves performance is this: they're typically of the form {{#if:{{{someparam|}}}|{{foo}}|{{bar . The preprocessor will see that this is a parser function call with three arguments, and expand all three of them before it runs the #if hook. I thought this was fixed ages ago with the new preprocessor. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Extending wikilinks syntax
On Fri, Jun 26, 2009 at 8:22 AM, Steve Bennettstevag...@gmail.com wrote: 3) A limited number of admin-controlled special templates can use an even wider range of features, including raw HTML. Admins are not going to be allowed to insert raw HTML. At least, not ordinary admins. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Current events-related overloads
2009/6/26 Aryeh Gregor simetrical+wikil...@gmail.com: But this sounds like a good idea. If a process is already parsing the page, why don't we just have other processes display an old cached version of the page instead of waiting or trying to reparse themselves? The worst that would happen is some users would get old views for a couple of minutes. This is a very good idea, and sounds much better than having those other processes wait for the first process to finish parsing. It would also reduce the severity of the deadlocks occurring when a process gets stuck on a parse or dies in the middle of it: instead of deadlocking, the other processes would just display stale versions instead of wasting time. If we design these parser cache locks to expire after a few minutes or so, it should work just fine. Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] PHP 5.3.0 coming soon!
2009/6/26 Chad innocentkil...@gmail.com: I could be completely off here, but I thought the lowest supported release was 5.1.x. Or that there was talk (somewhere?) of making that the case. Officially, MediaWiki supports PHP 5.0.x, but using it is recommended against because it has some buggy array handling functions (I think those bugs only existed on 64-bit platforms, not sure though). Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Enabling some string functions
On Fri, Jun 26, 2009 at 6:33 AM, Roan Kattouwroan.katt...@gmail.com wrote: The reason I believe breaking up templates improves performance is this: they're typically of the form {{#if:{{{someparam|}}}|{{foo}}|{{bar . The preprocessor will see that this is a parser function call with three arguments, and expand all three of them before it runs the #if hook. I thought this was fixed ages ago with the new preprocessor. I asked Domas whether it was and he said no; Tim, can you chip in on this? Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Enabling some string functions
On Fri, Jun 26, 2009 at 2:44 AM, Stephen Bain stephen.b...@gmail.comwrote: In the good old days someone would have solved the same problem by mentioning in the template's documentation that the parameter should use full URLs. Both the template and instances of it would be readable. Template programmers are not going to create accessible templates because they have a programming mindset, and set out to solve problems in ways like Brian's code above. The good old days are long gone. If you believe there is never a valid case for basic programming constructs such as conditionals you should have objected when ParserFunctions were first implemented. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Extending wikilinks syntax
On 26/06/2009, at 3:21 PM, Aryeh Gregor wrote: On Fri, Jun 26, 2009 at 8:22 AM, Steve Bennettstevag...@gmail.com wrote: 3) A limited number of admin-controlled special templates can use an even wider range of features, including raw HTML. Admins are not going to be allowed to insert raw HTML. At least, not ordinary admins. They already can, with Javascript, so there's no XSS issue. -- Andrew Garrett Contract Developer, Wikimedia Foundation agarr...@wikimedia.org http://werdn.us ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Enabling some string functions
On 26/06/2009, at 3:32 PM, Brian wrote: On Fri, Jun 26, 2009 at 2:44 AM, Stephen Bain stephen.b...@gmail.comwrote: In the good old days someone would have solved the same problem by mentioning in the template's documentation that the parameter should use full URLs. Both the template and instances of it would be readable. Template programmers are not going to create accessible templates because they have a programming mindset, and set out to solve problems in ways like Brian's code above. The good old days are long gone. If you believe there is never a valid case for basic programming constructs such as conditionals you should have objected when ParserFunctions were first implemented. The fact that we, at some stage, made the mistake of adding programming-like functions does not oblige us to complete the job. If we could make ParserFunctions go away, we would. ParserFunctions is there now, and there's too much code dependent on it to remove it right now. That analysis does not apply to StringFunctions. -- Andrew Garrett Contract Developer, Wikimedia Foundation agarr...@wikimedia.org http://werdn.us ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Extending wikilinks syntax
On Fri, Jun 26, 2009 at 11:46 AM, Andrew Garrettagarr...@wikimedia.org wrote: They already can, with Javascript, so there's no XSS issue. That ability may be removed in the future, and restricted to a smaller and more select group. Witness the problems we've been having with admins including tracking software. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Enabling some string functions
On Fri, Jun 26, 2009 at 7:16 AM, Aryeh Gregorsimetrical+wikil...@gmail.com wrote: On Fri, Jun 26, 2009 at 6:33 AM, Roan Kattouwroan.katt...@gmail.com wrote: The reason I believe breaking up templates improves performance is this: they're typically of the form {{#if:{{{someparam|}}}|{{foo}}|{{bar . The preprocessor will see that this is a parser function call with three arguments, and expand all three of them before it runs the #if hook. I thought this was fixed ages ago with the new preprocessor. My understanding has been that the PREprocessor expands all branches, by looking up and substituting transcluded templates and similar things, but that the actual processor only evaluates the branches that it needs. That's a lot faster than actually evaluating all branches (which is how things originally worked), but not quite as effective as if the dead branches were ignored entirely. (I could be totally wrong however.) -Robert Rohde ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] subst'ing #if parser functions loses line breaks, and other oddities
Hoi, At some stage Wikipedia was this thing that everybody can edit... I can not and will not edit this shit so what do you expect from the average Joe ?? Thanks, Gerard 2009/6/25 Tisza Gergő gti...@gmail.com Tim Starling tstarling at wikimedia.org writes: {{subst:!}} no longer works as a separator between parser function parameters, it just works as a literal character. Welcome to MediaWiki 1.12. Seems like it was intended to be the | in [[category:foo|bar]], except that someone forgot a | from the code. Correctly it would be: {subst|}}}#if:{{{par1|}}}|[[Category:{{{par1}}}{subst|}}}#if: {{{key1|}}}|{subst|}}}!}}{{{key1}]] !-- bpar1 -- }}{subst|}}}#if:{{{par2|}}}|[[Category:{{{par2}}}{subst|}}}#if: {{{key2|}}}|{subst|}}}!}}{{{key2}]] !-- bpar2 -- }}{subst|}}}#if:{{{par3|}}}|[[Category:{{{par3}}}{subst|}}}#if: {{{key3|}}}|{subst|}}}!}}{{{key3}]] !-- bpar3 -- }} (Note that I added extra linebreaks after #if: so that gmane doesn't complain for lines being too long.) The workarounds that come to mind for the line break issue are fairly obscure and complex. If I were you I'd just put the categories on the same line and be done with it. Just put the templates on separate lines and wrap the whole thing in another #if to discard additional newlines at the end: {{#if:1| {subst|}}}#if:{{{par1|1}}}|[[Category:{{{par1}}}{subst|}}}#if: {{{key1|1}}}|{subst|}}}!}}{{{key1}]]}} {subst|}}}#if:{{{par2|}}}|[[Category:{{{par2}}}{subst|}}}#if: {{{key2|}}}|{subst|}}}!}}{{{key2}]]}} {subst|}}}#if:{{{par3|}}}|[[Category:{{{par3}}}{subst|}}}#if: {{{key3|}}}|{subst|}}}!}}{{{key3}]]}} }} (This assumes that whenever par2 is missing, par3 is missing too.) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Enabling some string functions
2009/6/26 Robert Rohde raro...@gmail.com: My understanding has been that the PREprocessor expands all branches, by looking up and substituting transcluded templates and similar things, but that the actual processor only evaluates the branches that it needs. That's a lot faster than actually evaluating all branches (which is how things originally worked), but not quite as effective as if the dead branches were ignored entirely. (I could be totally wrong however.) You're right that dead code never reaches the parser (your processor), but ideally the preprocessor wouldn't bother expanding it either. I have vague recollection that it was fixed with the new preprocessor, as Simetrical said, but I have no idea how much truth there is in that. Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Minify
It's probably worth mentioning that this bug is still open: https://bugzilla.wikimedia.org/show_bug.cgi?id=17577 This will save not only traffic on subsequent page views (in this case: http://www.webpagetest.org/result/090218_132826127ab7f254499631e3e688b24b/1/details/cached/it's about 50K), but also improve performance dramatically. I wonder if anything can be done to at least make it work for local files - I have hard time understanding File vs. LocalFile vs. FSRepo relationships to enable this just for local file system. It's probably also wise to figure out a way for it to be implemented on non-local repositories too so Wikimedia projects can use it, but I'm completely out of the league here ;) Thank you, Sergey -- Sergey Chernyshev http://www.sergeychernyshev.com/ On Fri, Jun 26, 2009 at 11:42 AM, Robert Rohde raro...@gmail.com wrote: I'm going to mention this here, because it might be of interest on the Wikimedia cluster (or it might not). Last night I deposited Extension:Minify which is essentially a lightweight wrapper for the YUI CSS compressor and JSMin JavaScript compressor. If installed it automatically captures all content exported through action=raw and precompresses it by removing comments, formatting, and other human readable elements. All of the helpful elements still remain on the Mediawiki: pages, but they just don't get sent to users. Currently each page served to anons references 6 CSS/JS pages dynamically prepared by Mediawiki, of which 4 would be needed in the most common situation of viewing content online (i.e. assuming media=print and media=handheld are not downloaded in the typical case). These 4 pages, Mediawiki:Common.css, Mediawiki:Monobook.css, gen=css, and gen=js comprise about 60 kB on the English Wikipedia. (I'm using enwiki as a benchmark, but Commons and dewiki also have similar numbers to those discussed below.) After gzip compression, which I assume is available on most HTTP transactions these days, they total 17039 bytes. The comparable numbers if Minify is applied are 35 kB raw and 9980 after gzip, for a savings of 7 kB or about 40% of the total file size. Now in practical terms 7 kB could shave ~1.5s off a 36 kbps dialup connection. Or given Erik Zachte's observation that action=raw is called 500 million times per day, and assuming up to 7 kB / 4 savings per call, could shave up to 900 GB off of Wikimedia's daily traffic. (In practice, it would probably be somewhat less. 900 GB seems to be slightly under 2% of Wikimedia's total daily traffic if I am reading the charts correctly.) Anyway, that's the use case (such as it is): slightly faster initial downloads and a small but probably measurable impact on total bandwidth. The trade-off of course being that users receive CSS and JS pages from action=raw that are largely unreadable. The extension exists if Wikimedia is interested, though to be honest I primarily created it for use with my own more tightly bandwidth constrained sites. -Robert Rohde ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Minify
I would quickly add that the script-loader / new-upload branch also supports minify along with associating unique id's grouping gziping. So all your mediaWiki page includes are tied to their version numbers and can be cached forever without 304 requests by the client or _shift_ reload to get new js. Plus it works with all the static file based js includes as well. If a given set of files is constantly requested we can group them to avoid server round trips. And finally it lets us localize msg and package that in the JS (again avoiding separate trips for javascript interface msgs) for more info see the ~slightly outdated~ document: http://www.mediawiki.org/wiki/Extension:ScriptLoader peace, michael Robert Rohde wrote: I'm going to mention this here, because it might be of interest on the Wikimedia cluster (or it might not). Last night I deposited Extension:Minify which is essentially a lightweight wrapper for the YUI CSS compressor and JSMin JavaScript compressor. If installed it automatically captures all content exported through action=raw and precompresses it by removing comments, formatting, and other human readable elements. All of the helpful elements still remain on the Mediawiki: pages, but they just don't get sent to users. Currently each page served to anons references 6 CSS/JS pages dynamically prepared by Mediawiki, of which 4 would be needed in the most common situation of viewing content online (i.e. assuming media=print and media=handheld are not downloaded in the typical case). These 4 pages, Mediawiki:Common.css, Mediawiki:Monobook.css, gen=css, and gen=js comprise about 60 kB on the English Wikipedia. (I'm using enwiki as a benchmark, but Commons and dewiki also have similar numbers to those discussed below.) After gzip compression, which I assume is available on most HTTP transactions these days, they total 17039 bytes. The comparable numbers if Minify is applied are 35 kB raw and 9980 after gzip, for a savings of 7 kB or about 40% of the total file size. Now in practical terms 7 kB could shave ~1.5s off a 36 kbps dialup connection. Or given Erik Zachte's observation that action=raw is called 500 million times per day, and assuming up to 7 kB / 4 savings per call, could shave up to 900 GB off of Wikimedia's daily traffic. (In practice, it would probably be somewhat less. 900 GB seems to be slightly under 2% of Wikimedia's total daily traffic if I am reading the charts correctly.) Anyway, that's the use case (such as it is): slightly faster initial downloads and a small but probably measurable impact on total bandwidth. The trade-off of course being that users receive CSS and JS pages from action=raw that are largely unreadable. The extension exists if Wikimedia is interested, though to be honest I primarily created it for use with my own more tightly bandwidth constrained sites. -Robert Rohde ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] subst'ing #if parser functions loses line breaks, and other oddities
On Fri, Jun 26, 2009 at 12:01 PM, Gerard Meijssengerard.meijs...@gmail.com wrote: Hoi, At some stage Wikipedia was this thing that everybody can edit... I can not and will not edit this shit so what do you expect from the average Joe ?? I can not (effectively) contribute to http://en.wikipedia.org/wiki/Ten_Commandments_in_Roman_Catholicism Does this mean Wikipedia is a failure? I don't think so. Not everyone needs to be able to do everything. Thats one reasons projects have communities: Other people can do the work which I'm not interested in or not qualified for. Not everyone needs to make templates— and there are some people who'd have nothing else to do but add fart jokes to science articles if the site didn't have plenty of template mongering that needed doing. Unfortunately the existing system is needlessly exclusive. The existing parser function uses solution are so byzantine that even many people with the right interest and knowledge are significantly put off from it. The distinction between this and a general easy to use is a very critical one. It's also the case that the existing system's problems spills past its borders due to its own limitations: Regular users need to deal with things like weird whitespace handling and templates which MUST be substed (or can't be substed; at random from the user's perspective). This makes the system harder even for the vast majority of people who should never need to worry about the internals of the templates. I think this is the most important issue, and its one with real usability impacts, but it's not due to the poor syntax. On this point, the template language could be intercal but still leave most users completely free to ignore the messy insides. The existing system doesn't because there is no clear boundary between the page and the templates (among other reasons, like the limitations of the existing 'string' manipulation functions). ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Minify
On Fri, Jun 26, 2009 at 4:33 PM, Michael Dalemd...@wikimedia.org wrote: I would quickly add that the script-loader / new-upload branch also supports minify along with associating unique id's grouping gziping. So all your mediaWiki page includes are tied to their version numbers and can be cached forever without 304 requests by the client or _shift_ reload to get new js. Hm. Unique ids? Does this mean the every page on the site must be purged from the caches to cause all requests to see a new version number? Is there also some pending squid patch to let it jam in a new ID number on the fly for every request? Or have I misunderstood what this does? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Minify
correct me if I am wrong but thats how we presently update js and css.. we have $wgStyleVersion and when that gets updated we send out fresh pages with html pointing to js with $wgStyleVersion append. The difference in the context of the script-loader is we would read the version from the mediaWiki js pages that are being included and the $wgStyleVersion var. (avoiding the need to shift reload) ... in the context of rendering a normal page with dozens of template lookups I don't see this a particularly costly. Its a few extra getLatestRevID title calls. Likewise we should do this for images so we can send the cache forever header (bug 17577) avoiding a bunch of 304 requests. One part I am not completely clear on is how we avoid lots of simultaneous requests to the scriptLoader when it first generates the JavaScript to be cached on the squids, but other stuff must be throttled too no? Like when we update any code, language msgs, or local-settings does that does not result in the immediate purging all of wikipedia. --michael Gregory Maxwell wrote: On Fri, Jun 26, 2009 at 4:33 PM, Michael Dalemd...@wikimedia.org wrote: I would quickly add that the script-loader / new-upload branch also supports minify along with associating unique id's grouping gziping. So all your mediaWiki page includes are tied to their version numbers and can be cached forever without 304 requests by the client or _shift_ reload to get new js. Hm. Unique ids? Does this mean the every page on the site must be purged from the caches to cause all requests to see a new version number? Is there also some pending squid patch to let it jam in a new ID number on the fly for every request? Or have I misunderstood what this does? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] subst'ing #if parser functions loses line breaks, and other oddities
Hoi, In the past the existence of templates in one wiki has been used as an argument to not accept an extension. With extensions you have functionality that is indeed intended to be external to ordinary users but you are talking about functionality that can be tested. With templates you have stuff that can adn does severely impact performance and is at the same time not usable on other systems. While it may be so that you can not effectively contribute to an article on something esoteric as the ten commondmanets in Roman Catholocism, it might be possible for you to translate it in another language if you have the language skills. With the way templates are I would not touch them with a barge pole if I can help it. Templates are however the only tool we consider for things like info boxes and stuff. They are as a result quite important from a functional point of view. From a usability point of view they are horrible. In conclusion, templates are used and they prove to be problematic. The best proof of this is the recent performance issues we had. Thanks, GerardM 2009/6/26 Gregory Maxwell gmaxw...@gmail.com On Fri, Jun 26, 2009 at 12:01 PM, Gerard Meijssengerard.meijs...@gmail.com wrote: Hoi, At some stage Wikipedia was this thing that everybody can edit... I can not and will not edit this shit so what do you expect from the average Joe ?? I can not (effectively) contribute to http://en.wikipedia.org/wiki/Ten_Commandments_in_Roman_Catholicism Does this mean Wikipedia is a failure? I don't think so. Not everyone needs to be able to do everything. Thats one reasons projects have communities: Other people can do the work which I'm not interested in or not qualified for. Not everyone needs to make templates— and there are some people who'd have nothing else to do but add fart jokes to science articles if the site didn't have plenty of template mongering that needed doing. Unfortunately the existing system is needlessly exclusive. The existing parser function uses solution are so byzantine that even many people with the right interest and knowledge are significantly put off from it. The distinction between this and a general easy to use is a very critical one. It's also the case that the existing system's problems spills past its borders due to its own limitations: Regular users need to deal with things like weird whitespace handling and templates which MUST be substed (or can't be substed; at random from the user's perspective). This makes the system harder even for the vast majority of people who should never need to worry about the internals of the templates. I think this is the most important issue, and its one with real usability impacts, but it's not due to the poor syntax. On this point, the template language could be intercal but still leave most users completely free to ignore the messy insides. The existing system doesn't because there is no clear boundary between the page and the templates (among other reasons, like the limitations of the existing 'string' manipulation functions). ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Minify
Aryeh Gregor wrote: Any given image is not included on every single page on the wiki. Purging a few thousand pages from Squid on an image reupload (should be rare for such a heavily-used image) is okay. Purging every single page on the wiki is not. yea .. we are just talking about adding image.jpg?image_revision_id to all the image src at page render time should never purge everything on the wiki ;) No. We don't purge Squid on these events, we just let people see old copies. Of course, this doesn't normally apply to registered users (who usually [always?] get Squid misses), or to pages that aren't cached (edit, history, . . .). oky thats basically what I understood. That makes sense.. although it would be nice to think about a job or process that purges pages with outdated language msg, or pages that are referencing outdated scripts, style-sheet, or image urls. We ~do~ add jobs to purge for template updates. Are other things like language msg code updates candidates for job purge tasks? ... I guess its not too big a deal to get an old page until someone updates it. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Minify
2009/6/26 Robert Rohde raro...@gmail.com: I'm going to mention this here, because it might be of interest on the Wikimedia cluster (or it might not). Last night I deposited Extension:Minify which is essentially a lightweight wrapper for the YUI CSS compressor and JSMin JavaScript compressor. If installed it automatically captures all content exported through action=raw and precompresses it by removing comments, formatting, and other human readable elements. All of the helpful elements still remain on the Mediawiki: pages, but they just don't get sent to users. Currently each page served to anons references 6 CSS/JS pages dynamically prepared by Mediawiki, of which 4 would be needed in the most common situation of viewing content online (i.e. assuming media=print and media=handheld are not downloaded in the typical case). These 4 pages, Mediawiki:Common.css, Mediawiki:Monobook.css, gen=css, and gen=js comprise about 60 kB on the English Wikipedia. (I'm using enwiki as a benchmark, but Commons and dewiki also have similar numbers to those discussed below.) After gzip compression, which I assume is available on most HTTP transactions these days, they total 17039 bytes. The comparable numbers if Minify is applied are 35 kB raw and 9980 after gzip, for a savings of 7 kB or about 40% of the total file size. Now in practical terms 7 kB could shave ~1.5s off a 36 kbps dialup connection. Or given Erik Zachte's observation that action=raw is called 500 million times per day, and assuming up to 7 kB / 4 savings per call, could shave up to 900 GB off of Wikimedia's daily traffic. (In practice, it would probably be somewhat less. 900 GB seems to be slightly under 2% of Wikimedia's total daily traffic if I am reading the charts correctly.) Anyway, that's the use case (such as it is): slightly faster initial downloads and a small but probably measurable impact on total bandwidth. The trade-off of course being that users receive CSS and JS pages from action=raw that are largely unreadable. The extension exists if Wikimedia is interested, though to be honest I primarily created it for use with my own more tightly bandwidth constrained sites. This sounds great but I have a problem with making action=raw return something that is not raw. For MediaWiki I think it would be better to add a new action=minify What would the pluses and minuses of that be? Andrew Dunbar (hippietrail) -Robert Rohde ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- http://wiktionarydev.leuksman.com http://linguaphile.sf.net ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Minify
It probably depend on how getTimestamp() is implemented for non-local repos. Important thing is not to have it return new values too often and return real version of the image. If this is already the case, can someone apply this patch then - don't want to be responsible for such an important change ;) Sergey On Fri, Jun 26, 2009 at 3:52 PM, Chad innocentkil...@gmail.com wrote: You're patching already-existing functionality at the File level, so it should be ok to just plop it in there. I'm not sure how this will affect the ForeignApi interface, so it'd be worth testing there too. From what I can tell at a (very) quick glance, it shouldn't adversely affect anything from a client perspective on the API, as we just rely on whatever URL was provided to us to begin with. -Chad On Fri, Jun 26, 2009 at 3:31 PM, Sergey Chernyshevsergey.chernys...@gmail.com wrote: Which of all those file to change to apply my patch only to files in default repository? Currently my patch is applied to File.php http://bug-attachment.wikimedia.org/attachment.cgi?id=5833 If you just point me into right direction, I'll update the patch and upload it myself. Thank you, Sergey -- Sergey Chernyshev http://www.sergeychernyshev.com/ On Fri, Jun 26, 2009 at 3:17 PM, Chad innocentkil...@gmail.com wrote: The structure is LocalRepo extends FSRepo extends FileRepo. ForeignApiRepo extends FileRepo directly, and ForeignDbRepo extends LocalRepo. -Chad On Jun 26, 2009 3:15 PM, Sergey Chernyshev sergey.chernys...@gmail.com wrote: It's probably worth mentioning that this bug is still open: https://bugzilla.wikimedia.org/show_bug.cgi?id=17577 This will save not only traffic on subsequent page views (in this case: http://www.webpagetest.org/result/090218_132826127ab7f254499631e3e688b24b/1/details/cached/it'shttp://www.webpagetest.org/result/090218_132826127ab7f254499631e3e688b24b/1/details/cached/it%27s http://www.webpagetest.org/result/090218_132826127ab7f254499631e3e688b24b/1/details/cached/it%27s about 50K), but also improve performance dramatically. I wonder if anything can be done to at least make it work for local files - I have hard time understanding File vs. LocalFile vs. FSRepo relationships to enable this just for local file system. It's probably also wise to figure out a way for it to be implemented on non-local repositories too so Wikimedia projects can use it, but I'm completely out of the league here ;) Thank you, Sergey -- Sergey Chernyshev http://www.sergeychernyshev.com/ On Fri, Jun 26, 2009 at 11:42 AM, Robert Rohde raro...@gmail.com wrote: I'm going to mention ... ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Current events-related overloads
This is a very good idea, and sounds much better than having those the major problem with all dirty caching is that we have more than one caching layer, and of course, things abort. the fact, that people should be shown dirty versions instead of proper article leads to situation where in case of vandal fighting, etc, people will see stale versions, instead of waiting few seconds and getting real one. In theory, update flow could look like this: 1. Set I'm working on this in a parallelism coordinator or lock manager 2. Do all database transactions commit 3. Parse 4. Set memcached object 5. Invalidate squid objects Now, should we parse, block or serve stale, could be dynamic, e.g. if we detect more than x parallel parses we fall back to blocking for few seconds, once we detect more than y of blocked threads on the task, or block expires and there's no fresh content yet (or there's new copy.. ) - then stale stuff can be served. In perfect world that asks for specialized software :) Do note, for past quite a few years we did lots and lots of work to avoid stale content being served. I would not see dirty serving as something we should be proud of ;-) Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l