Re: [Wikitech-l] Enabling some string functions

2009-06-26 Thread Robert Rohde
On Thu, Jun 25, 2009 at 10:35 PM, Tim Starlingtstarl...@wikimedia.org wrote:
snip
 The community of people who work on such templates is an extremely
 small, self-selected subset of the community of editors. It is that
 tiny segment of the community that can code in this accidental
 programming language, who are not deterred by its density,
 inconsistency or performance limitations.

There is some truth to this.  However, I believe the community of
people who would like to see string functions is much, much larger,
than just the community of template coders.  Most Wikipedians can use
templates even if they don't feel comfortable creating them, and many
of them have at one time or another encountered practical problems
that could be solved with basic string functionality.

snip

 Introducing a
 scripting language will not make those accumulated contributions
 disappear. The task of deciphering them, and converting them to a more
 accessible form, will remain.

Do you actually have a plan for introducing a scripting language?

Lua, which seems to your favored strategy, was recently LATER-ed on
bugzilla by Brion, and suffers from several serious problems.  For
example the dependency on compiled binaries is highly undesirable.
The relative power of a full programming language would require
limiting its resources to avoid bad code consuming all memory or
flooding Mediawiki with output, and that is only the starting point
for considering the risks of malicious or overtaxing code.  Not to
mention that the comments at Extension talk:Lua suggest several people
have failed in attempts to get the Extension working at all.

Even if one gets past that, Lua brings its own grammar, set of
function keywords, and methodologies, which will again create a high
barrier to participation for people wanting to work with it.

Frankly Lua feels like it creates at least as many usability and
portability problems as it solves, and is still a long ways off.

Werdna's suggestion to adapt the AbuseFilter parser into a home-grown
Mediawiki scripting language feels lot more natural in terms of
control and ability to affect an integrated presentation, but that
would also seem quite distant.


If one is going to say no string functions until the template coding
problem is solved, then I'd liked to know if there is really a
serious strategy for doing that.

-Robert Rohde

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Current events-related overloads

2009-06-26 Thread Thomas Dalton
2009/6/26 Brion Vibber br...@wikimedia.org:
 Tim Starling wrote:
 It's quite a complex feature. If you have a server that deadlocks or
 is otherwise extremely slow, then it will block rendering for all
 other attempts, meaning that the article can not be viewed at all.
 That scenario could even lead to site-wide downtime, since threads
 waiting for the locks could consume all available apache threads, or
 all available DB connections.

 It's a reasonable idea, but implementing it would require a careful
 design, and possibly some other concepts like per-article thread count
 limits.

 *nod* We should definitely ponder the issue since it comes up
 intermittently but regularly with big news events like this. At the
 least if we can have some automatic threshold that temporarily disables
 or reduces hits on stampeded pages that'd be spiffy...

Of course, the fact that everyone's first port of call after hearing
such news is to check the Wikipedia page is a fantastic thing, so it
would be really unfortunate if we have to stop people doing that.
Would it be possible, perhaps, to direct all requests for a certain
page through one server so the rest can continue to serve the rest of
the site unaffected? Or perhaps excessively popular pages could be
rendered (for anons) as part of the editing process, rather than the
viewing process, since that would mean each version of the article is
rendered only once (for anons) and would just slow down editing
slightly (presumably by a fraction of a second), which we can live
with. There must be something we can do that allows people to continue
viewing the page wherever possible.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Enabling some string functions

2009-06-26 Thread Roan Kattouw
2009/6/26 Stephen Bain stephen.b...@gmail.com:
 In the good old days someone would have solved the same problem by
 mentioning in the template's documentation that the parameter should
 use full URLs. Both the template and instances of it would be
 readable.

 Template programmers are not going to create accessible templates
 because they have a  programming mindset, and set out to solve
 problems in ways like Brian's code above.

Maybe it's the mindset that should be changed then? For one thing,
{{link}} used to use {{substr}} to check if the first argument started
with http:// , https:// or ftp:// and produced an internal link if
not, despite the fact that the documentation for {{link}} clearly
states that it creates an *external* link, which means people
shouldn't be using it to create internal links. If people try to use a
template for something it's not intended for, they should be told to
use a different template; currently, it seems like the template is
just extended with new functionality, leading unnecessary {{#if: ,
{{#switch: and {{substr}} uses that serve only the users' laziness.

To get back to {{cite}}: the template itself contains no more than
some logic to choose between {{Citation/core}} and {{Citation/patent}}
based on the presence/absence of certain parameters, and
{{Citation/core}} does the same thing to choose between books and
periodicals. What's wrong with breaking up this template in, say,
{{cite patent}}, {{cite book}} and {{cite periodical}}? Similarly,
other multifunctional templates could be broken up as well.

The reason I believe breaking up templates improves performance is
this: they're typically of the form
{{#if:{{{someparam|}}}|{{foo}}|{{bar . The preprocessor will see
that this is a parser function call with three arguments, and expand
all three of them before it runs the #if hook. This means both {{foo}}
and {{bar}} get expanded, one of which in vain. Of course this is even
worse for complex systems of nested #if/#ifeq statements and/or
#switch statements, in which every possible 'code' path is evaluated
before a decision is made. In practice, this means that for every call
to {{cite}}, which seems to have three major modes, the preprocessor
will spend about 2/3 of its time expanding stuff it's gonna throw away
anyway.

To fix this, control flow parser functions such as #if could be put in
a special class of parser functions that take their arguments
unexpanded. They could then call the parser to expand their first
argument and return a value based on that. Whether these functions are
expected to return expanded or unexpanded wikitext doesn't really
matter from a performance standpoint. (Disclaimer: I'm hardly a parser
expert, Tim is; he should of course be the judge of the feasibility of
this proposal.)

As an aside, lazy evaluation of #if statements would also improve
performance for stuff like:

{{#if:{{{param1|}}}|Do something with param1
{{#if:{{{param2|}}}|Do something with param2
...
{{#if:{{{param9|}}}|Do something with param9}}

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Enabling some string functions

2009-06-26 Thread Nikola Smolenski
Roan Kattouw wrote:
 To get back to {{cite}}: the template itself contains no more than
 some logic to choose between {{Citation/core}} and {{Citation/patent}}
 based on the presence/absence of certain parameters, and
 {{Citation/core}} does the same thing to choose between books and
 periodicals. What's wrong with breaking up this template in, say,
 {{cite patent}}, {{cite book}} and {{cite periodical}}? Similarly,
 other multifunctional templates could be broken up as well.

While this is not a comment on merits of string functions in general, 
there are following wrong things with that approach:

- It is easier for users to remember the name of just a single template.

- Multiple templates that are separately maintained will diverge over 
time, for example same parameters might end being named differently.

- A new feature in one template can't be easily applied to another template.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Extending wikilinks syntax

2009-06-26 Thread Steve Bennett
On Fri, Jun 26, 2009 at 12:07 PM, Aryeh
Gregorsimetrical+wikil...@gmail.com wrote:

 From the editor's point of view.  Not from the view of the HTML
 source, which is what the original proposal was looking at.

I guess.

I'm starting to get the initial pangs of an idea that we should have
different kinds of syntax:

1) Article pages should only be allowed simplified syntax: no parser
functions, nothing funky at all. You want to use weird features, you
must wrap it in a template
2) Normal templates can use the full range of existing syntax
3) A limited number of admin-controlled special templates can use an
even wider range of features, including raw HTML.

Then, if you really specific HTML for a very specific, widely used
template, you could, without opening up any cans of worms.

[The benefit from 1) above is less unreadable wikitext in article
space, though I suspect that's fairly limited already, and unreadable
wikitext is mostly from refs and massive templates like {{cite}} ]

Steve

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] PHP 5.3.0 coming soon!

2009-06-26 Thread Aryeh Gregor
On Fri, Jun 26, 2009 at 6:24 AM, Andrew Garrettagarr...@wikimedia.org wrote:
 Hooray for closures!

 Do we have plans to update the cluster?

Does it matter if MediaWiki still has to work on PHP 5.0?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Current events-related overloads

2009-06-26 Thread Aryeh Gregor
On Fri, Jun 26, 2009 at 6:33 AM, Thomas Daltonthomas.dal...@gmail.com wrote:
 Of course, the fact that everyone's first port of call after hearing
 such news is to check the Wikipedia page is a fantastic thing, so it
 would be really unfortunate if we have to stop people doing that.

He didn't say we'd shut down views for the article, just we'd shut
down reparsing or cache invalidation or something.  This is the live
hack that was applied yesterday:

Index: includes/parser/ParserCache.php
===
--- includes/parser/ParserCache.php (revision 52359)
+++ includes/parser/ParserCache.php (working copy)
 -63,6 +63,7 @@
if ( is_object( $value ) ) {
wfDebug( Found.\n );
# Delete if article has changed since the cache was made
+   if( $article-mTitle-getPrefixedText() != 'Michael 
Jackson' ) {
// temp hack!
$canCache = $article-checkTouched();
$cacheTime = $value-getCacheTime();
$touched = $article-mTouched;
 -82,6 +83,7 @@
}
wfIncrStats( pcache_hit );
}
+   }// temp hack!
} else {
wfDebug( Parser cache miss.\n );
wfIncrStats( pcache_miss_absent );

It just meant that people were seeing outdated versions of the article.

 Would it be possible, perhaps, to direct all requests for a certain
 page through one server so the rest can continue to serve the rest of
 the site unaffected?

Every page view involves a number of servers, and they're not all
interchangeable, so this doesn't make a lot of sense.

 Or perhaps excessively popular pages could be
 rendered (for anons) as part of the editing process, rather than the
 viewing process, since that would mean each version of the article is
 rendered only once (for anons) and would just slow down editing
 slightly (presumably by a fraction of a second), which we can live
 with.

You think that parsing a large page takes a fraction of a second?  Try
twenty or thirty seconds.

But this sounds like a good idea.  If a process is already parsing the
page, why don't we just have other processes display an old cached
version of the page instead of waiting or trying to reparse
themselves?  The worst that would happen is some users would get old
views for a couple of minutes.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] PHP 5.3.0 coming soon!

2009-06-26 Thread Chad
On Fri, Jun 26, 2009 at 9:48 AM, Aryeh
Gregorsimetrical+wikil...@gmail.com wrote:
 On Fri, Jun 26, 2009 at 6:24 AM, Andrew Garrettagarr...@wikimedia.org wrote:
 Hooray for closures!

 Do we have plans to update the cluster?

 Does it matter if MediaWiki still has to work on PHP 5.0?

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


I could be completely off here, but I thought the lowest supported
release was 5.1.x. Or that there was talk (somewhere?) of making
that the case.

-Chad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Enabling some string functions

2009-06-26 Thread Aryeh Gregor
On Thu, Jun 25, 2009 at 11:33 PM, Tim Starlingtstarl...@wikimedia.org wrote:
 Those templates can be defeated by reducing the functionality of
 padleft/padright, and I think that would be a better course of action
 than enabling the string functions.

 The set of string functions you describe are not the most innocuous
 ones, they're the ones I most want to keep out of Wikipedia, at least
 until we have a decent server-side scripting language in parallel.

Well, then at least let's be consistent and cripple padleft/padright.

Also, while I disagree with Robert's skepticism about the comparative
usability of a real scripting language, I'd be interested to hear what
your ideas are for actually implementing that.

Come to think of it, the easiest scripting language to implement would
be . . . PHP!  Just run it through the built-in PHP parser, carefully
sanitize the tokens so that it's safe (possibly banning things like
function definitions), and eval()!  We could even dump the scripts
into lots of little files and use includes, so APC can cache them.
That would probably be the easiest thing to do, if we need to keep
pure PHP support for the sake of third parties.  It's kind of
horrible, of course . . .

How much of Wikipedia is your random shared-hosted site going to be
able to mirror anyway, though?  Couldn't we at least require working
exec() to get infoboxes to work?  People on shared hosting could use
Special:ExpandTemplates to get a copy of the article with no
dependencies, too (albeit with rather messy source code).

On Fri, Jun 26, 2009 at 6:33 AM, Roan Kattouwroan.katt...@gmail.com wrote:
 The reason I believe breaking up templates improves performance is
 this: they're typically of the form
 {{#if:{{{someparam|}}}|{{foo}}|{{bar . The preprocessor will see
 that this is a parser function call with three arguments, and expand
 all three of them before it runs the #if hook.

I thought this was fixed ages ago with the new preprocessor.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Extending wikilinks syntax

2009-06-26 Thread Aryeh Gregor
On Fri, Jun 26, 2009 at 8:22 AM, Steve Bennettstevag...@gmail.com wrote:
 3) A limited number of admin-controlled special templates can use an
 even wider range of features, including raw HTML.

Admins are not going to be allowed to insert raw HTML.  At least, not
ordinary admins.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Current events-related overloads

2009-06-26 Thread Roan Kattouw
2009/6/26 Aryeh Gregor simetrical+wikil...@gmail.com:
 But this sounds like a good idea.  If a process is already parsing the
 page, why don't we just have other processes display an old cached
 version of the page instead of waiting or trying to reparse
 themselves?  The worst that would happen is some users would get old
 views for a couple of minutes.

This is a very good idea, and sounds much better than having those
other processes wait for the first process to finish parsing. It would
also reduce the severity of the deadlocks occurring when a process
gets stuck on a parse or dies in the middle of it: instead of
deadlocking, the other processes would just display stale versions
instead of wasting time. If we design these parser cache locks to
expire after a few minutes or so, it should work just fine.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] PHP 5.3.0 coming soon!

2009-06-26 Thread Roan Kattouw
2009/6/26 Chad innocentkil...@gmail.com:
 I could be completely off here, but I thought the lowest supported
 release was 5.1.x. Or that there was talk (somewhere?) of making
 that the case.

Officially, MediaWiki supports PHP 5.0.x, but using it is recommended
against because it has some buggy array handling functions (I think
those bugs only existed on 64-bit platforms, not sure though).

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Enabling some string functions

2009-06-26 Thread Roan Kattouw
 On Fri, Jun 26, 2009 at 6:33 AM, Roan Kattouwroan.katt...@gmail.com wrote:
 The reason I believe breaking up templates improves performance is
 this: they're typically of the form
 {{#if:{{{someparam|}}}|{{foo}}|{{bar . The preprocessor will see
 that this is a parser function call with three arguments, and expand
 all three of them before it runs the #if hook.

 I thought this was fixed ages ago with the new preprocessor.

I asked Domas whether it was and he said no; Tim, can you chip in on this?

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Enabling some string functions

2009-06-26 Thread Brian
On Fri, Jun 26, 2009 at 2:44 AM, Stephen Bain stephen.b...@gmail.comwrote:

 In the good old days someone would have solved the same problem by
 mentioning in the template's documentation that the parameter should
 use full URLs. Both the template and instances of it would be
 readable.

 Template programmers are not going to create accessible templates
 because they have a  programming mindset, and set out to solve
 problems in ways like Brian's code above.


The good old days are long gone. If you believe there is never a valid case
for basic programming constructs such as conditionals you should have
objected  when ParserFunctions were first implemented.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Extending wikilinks syntax

2009-06-26 Thread Andrew Garrett

On 26/06/2009, at 3:21 PM, Aryeh Gregor wrote:

 On Fri, Jun 26, 2009 at 8:22 AM, Steve Bennettstevag...@gmail.com  
 wrote:
 3) A limited number of admin-controlled special templates can use an
 even wider range of features, including raw HTML.

 Admins are not going to be allowed to insert raw HTML.  At least, not
 ordinary admins.


They already can, with Javascript, so there's no XSS issue.

--
Andrew Garrett
Contract Developer, Wikimedia Foundation
agarr...@wikimedia.org
http://werdn.us




___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Enabling some string functions

2009-06-26 Thread Andrew Garrett

On 26/06/2009, at 3:32 PM, Brian wrote:

 On Fri, Jun 26, 2009 at 2:44 AM, Stephen Bain  
 stephen.b...@gmail.comwrote:

 In the good old days someone would have solved the same problem by
 mentioning in the template's documentation that the parameter should
 use full URLs. Both the template and instances of it would be
 readable.

 Template programmers are not going to create accessible templates
 because they have a  programming mindset, and set out to solve
 problems in ways like Brian's code above.

 The good old days are long gone. If you believe there is never a  
 valid case
 for basic programming constructs such as conditionals you should have
 objected  when ParserFunctions were first implemented.


The fact that we, at some stage, made the mistake of adding  
programming-like functions does not oblige us to complete the job.

If we could make ParserFunctions go away, we would. ParserFunctions is  
there now, and there's too much code dependent on it to remove it  
right now. That analysis does not apply to StringFunctions.

--
Andrew Garrett
Contract Developer, Wikimedia Foundation
agarr...@wikimedia.org
http://werdn.us




___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Extending wikilinks syntax

2009-06-26 Thread Aryeh Gregor
On Fri, Jun 26, 2009 at 11:46 AM, Andrew Garrettagarr...@wikimedia.org wrote:
 They already can, with Javascript, so there's no XSS issue.

That ability may be removed in the future, and restricted to a smaller
and more select group.  Witness the problems we've been having with
admins including tracking software.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Enabling some string functions

2009-06-26 Thread Robert Rohde
On Fri, Jun 26, 2009 at 7:16 AM, Aryeh
Gregorsimetrical+wikil...@gmail.com wrote:
 On Fri, Jun 26, 2009 at 6:33 AM, Roan Kattouwroan.katt...@gmail.com wrote:
 The reason I believe breaking up templates improves performance is
 this: they're typically of the form
 {{#if:{{{someparam|}}}|{{foo}}|{{bar . The preprocessor will see
 that this is a parser function call with three arguments, and expand
 all three of them before it runs the #if hook.

 I thought this was fixed ages ago with the new preprocessor.

My understanding has been that the PREprocessor expands all branches,
by looking up and substituting transcluded templates and similar
things, but that the actual processor only evaluates the branches that
it needs.  That's a lot faster than actually evaluating all branches
(which is how things originally worked), but not quite as effective as
if the dead branches were ignored entirely.

(I could be totally wrong however.)

-Robert Rohde

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] subst'ing #if parser functions loses line breaks, and other oddities

2009-06-26 Thread Gerard Meijssen
Hoi,
At some stage Wikipedia was this thing that everybody can edit... I can not
and will not edit this shit so what do you expect from the average Joe ??
Thanks,
  Gerard

2009/6/25 Tisza Gergő gti...@gmail.com

 Tim Starling tstarling at wikimedia.org writes:

  {{subst:!}} no longer works as a separator between parser function
  parameters, it just works as a literal character. Welcome to MediaWiki
  1.12.

 Seems like it was intended to be the | in [[category:foo|bar]], except that
 someone forgot a | from the code. Correctly it would be:

 {subst|}}}#if:{{{par1|}}}|[[Category:{{{par1}}}{subst|}}}#if:
 {{{key1|}}}|{subst|}}}!}}{{{key1}]]
 !-- bpar1 --
 }}{subst|}}}#if:{{{par2|}}}|[[Category:{{{par2}}}{subst|}}}#if:
 {{{key2|}}}|{subst|}}}!}}{{{key2}]]
 !-- bpar2 --
 }}{subst|}}}#if:{{{par3|}}}|[[Category:{{{par3}}}{subst|}}}#if:
 {{{key3|}}}|{subst|}}}!}}{{{key3}]]
 !-- bpar3 --
 }}

 (Note that I added extra linebreaks after #if: so that gmane doesn't
 complain
 for lines being too long.)

  The workarounds that come to mind for the line break issue are fairly
  obscure and complex. If I were you I'd just put the categories on the
  same line and be done with it.

 Just put the templates on separate lines and wrap the whole thing in
 another #if
 to discard additional newlines at the end:

 {{#if:1|
 {subst|}}}#if:{{{par1|1}}}|[[Category:{{{par1}}}{subst|}}}#if:
 {{{key1|1}}}|{subst|}}}!}}{{{key1}]]}}
 {subst|}}}#if:{{{par2|}}}|[[Category:{{{par2}}}{subst|}}}#if:
 {{{key2|}}}|{subst|}}}!}}{{{key2}]]}}
 {subst|}}}#if:{{{par3|}}}|[[Category:{{{par3}}}{subst|}}}#if:
 {{{key3|}}}|{subst|}}}!}}{{{key3}]]}}
 }}

 (This assumes that whenever par2 is missing, par3 is missing too.)


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Enabling some string functions

2009-06-26 Thread Roan Kattouw
2009/6/26 Robert Rohde raro...@gmail.com:
 My understanding has been that the PREprocessor expands all branches,
 by looking up and substituting transcluded templates and similar
 things, but that the actual processor only evaluates the branches that
 it needs.  That's a lot faster than actually evaluating all branches
 (which is how things originally worked), but not quite as effective as
 if the dead branches were ignored entirely.

 (I could be totally wrong however.)

You're right that dead code never reaches the parser (your
processor), but ideally the preprocessor wouldn't bother expanding
it either. I have vague recollection that it was fixed with the new
preprocessor, as Simetrical said, but I have no idea how much truth
there is in that.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Minify

2009-06-26 Thread Sergey Chernyshev
It's probably worth mentioning that this bug is still open:
https://bugzilla.wikimedia.org/show_bug.cgi?id=17577

This will save not only traffic on subsequent page views (in this case:
http://www.webpagetest.org/result/090218_132826127ab7f254499631e3e688b24b/1/details/cached/it's
about 50K), but also improve performance dramatically.

I wonder if anything can be done to at least make it work for local files -
I have hard time understanding File vs. LocalFile vs. FSRepo relationships
to enable this just for local file system.

It's probably also wise to figure out a way for it to be implemented on
non-local repositories too so Wikimedia projects can use it, but I'm
completely out of the league here ;)

Thank you,

Sergey


--
Sergey Chernyshev
http://www.sergeychernyshev.com/


On Fri, Jun 26, 2009 at 11:42 AM, Robert Rohde raro...@gmail.com wrote:

 I'm going to mention this here, because it might be of interest on the
 Wikimedia cluster (or it might not).

 Last night I deposited Extension:Minify which is essentially a
 lightweight wrapper for the YUI CSS compressor and JSMin JavaScript
 compressor.  If installed it automatically captures all content
 exported through action=raw and precompresses it by removing comments,
 formatting, and other human readable elements.  All of the helpful
 elements still remain on the Mediawiki: pages, but they just don't get
 sent to users.

 Currently each page served to anons references 6 CSS/JS pages
 dynamically prepared by Mediawiki, of which 4 would be needed in the
 most common situation of viewing content online (i.e. assuming
 media=print and media=handheld are not downloaded in the typical
 case).

 These 4 pages, Mediawiki:Common.css, Mediawiki:Monobook.css, gen=css,
 and gen=js comprise about 60 kB on the English Wikipedia.  (I'm using
 enwiki as a benchmark, but Commons and dewiki also have similar
 numbers to those discussed below.)

 After gzip compression, which I assume is available on most HTTP
 transactions these days, they total 17039 bytes.  The comparable
 numbers if Minify is applied are 35 kB raw and 9980 after gzip, for a
 savings of 7 kB or about 40% of the total file size.

 Now in practical terms 7 kB could shave ~1.5s off a 36 kbps dialup
 connection.  Or given Erik Zachte's observation that action=raw is
 called 500 million times per day, and assuming up to 7 kB / 4 savings
 per call, could shave up to 900 GB off of Wikimedia's daily traffic.
 (In practice, it would probably be somewhat less.  900 GB seems to be
 slightly under 2% of Wikimedia's total daily traffic if I am reading
 the charts correctly.)


 Anyway, that's the use case (such as it is): slightly faster initial
 downloads and a small but probably measurable impact on total
 bandwidth.  The trade-off of course being that users receive CSS and
 JS pages from action=raw that are largely unreadable.  The extension
 exists if Wikimedia is interested, though to be honest I primarily
 created it for use with my own more tightly bandwidth constrained
 sites.

 -Robert Rohde

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Minify

2009-06-26 Thread Michael Dale
I would quickly add that the script-loader / new-upload branch also 
supports minify along with associating unique id's grouping  gziping.

So all your mediaWiki page includes are tied to their version numbers 
and can be cached forever without 304 requests by the client or _shift_ 
reload to get new js.

Plus it works with all the static file based js includes as well. If a 
given set of files is constantly requested we can group them to avoid 
server round trips. And finally it lets us localize msg and package that 
in the JS (again avoiding separate trips for javascript interface msgs)

for more info see the ~slightly outdated~ document:  
http://www.mediawiki.org/wiki/Extension:ScriptLoader

peace,
michael
 
Robert Rohde wrote:
 I'm going to mention this here, because it might be of interest on the
 Wikimedia cluster (or it might not).

 Last night I deposited Extension:Minify which is essentially a
 lightweight wrapper for the YUI CSS compressor and JSMin JavaScript
 compressor.  If installed it automatically captures all content
 exported through action=raw and precompresses it by removing comments,
 formatting, and other human readable elements.  All of the helpful
 elements still remain on the Mediawiki: pages, but they just don't get
 sent to users.

 Currently each page served to anons references 6 CSS/JS pages
 dynamically prepared by Mediawiki, of which 4 would be needed in the
 most common situation of viewing content online (i.e. assuming
 media=print and media=handheld are not downloaded in the typical
 case).

 These 4 pages, Mediawiki:Common.css, Mediawiki:Monobook.css, gen=css,
 and gen=js comprise about 60 kB on the English Wikipedia.  (I'm using
 enwiki as a benchmark, but Commons and dewiki also have similar
 numbers to those discussed below.)

 After gzip compression, which I assume is available on most HTTP
 transactions these days, they total 17039 bytes.  The comparable
 numbers if Minify is applied are 35 kB raw and 9980 after gzip, for a
 savings of 7 kB or about 40% of the total file size.

 Now in practical terms 7 kB could shave ~1.5s off a 36 kbps dialup
 connection.  Or given Erik Zachte's observation that action=raw is
 called 500 million times per day, and assuming up to 7 kB / 4 savings
 per call, could shave up to 900 GB off of Wikimedia's daily traffic.
 (In practice, it would probably be somewhat less.  900 GB seems to be
 slightly under 2% of Wikimedia's total daily traffic if I am reading
 the charts correctly.)


 Anyway, that's the use case (such as it is): slightly faster initial
 downloads and a small but probably measurable impact on total
 bandwidth.  The trade-off of course being that users receive CSS and
 JS pages from action=raw that are largely unreadable.  The extension
 exists if Wikimedia is interested, though to be honest I primarily
 created it for use with my own more tightly bandwidth constrained
 sites.

 -Robert Rohde

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
   


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] subst'ing #if parser functions loses line breaks, and other oddities

2009-06-26 Thread Gregory Maxwell
On Fri, Jun 26, 2009 at 12:01 PM, Gerard
Meijssengerard.meijs...@gmail.com wrote:
 Hoi,
 At some stage Wikipedia was this thing that everybody can edit... I can not
 and will not edit this shit so what do you expect from the average Joe ??

I can not (effectively) contribute to
http://en.wikipedia.org/wiki/Ten_Commandments_in_Roman_Catholicism

Does this mean Wikipedia is a failure?

I don't think so.  Not everyone needs to be able to do everything.
Thats one reasons projects have communities: Other people can do the
work which I'm not interested in or not qualified for.  Not everyone
needs to make templates— and there are some people who'd have nothing
else to do but add fart jokes to science articles if the site didn't
have plenty of template mongering that needed doing.

Unfortunately the existing system is needlessly exclusive. The
existing parser function uses solution are so byzantine that even many
people with the right interest and knowledge are significantly put off
from it.

The distinction between this and a general easy to use is a very
critical one.

It's also the case that the existing system's problems spills past its
borders due to its own limitations: Regular users need to deal with
things like weird whitespace handling and templates which MUST be
substed (or can't be substed; at random from the user's perspective).
This makes the system harder even for the vast majority of people who
should never need to worry about the internals of the templates.

I think this is the most important issue, and its one with real
usability impacts,  but it's not due to the poor syntax. On this
point, the template language could be intercal but still leave most
users completely free to ignore the messy insides. The existing system
doesn't because there is no clear boundary between the page and the
templates (among other reasons, like the limitations of the existing
'string' manipulation functions).

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Minify

2009-06-26 Thread Gregory Maxwell
On Fri, Jun 26, 2009 at 4:33 PM, Michael Dalemd...@wikimedia.org wrote:
 I would quickly add that the script-loader / new-upload branch also
 supports minify along with associating unique id's grouping  gziping.

 So all your mediaWiki page includes are tied to their version numbers
 and can be cached forever without 304 requests by the client or _shift_
 reload to get new js.

Hm. Unique ids?

Does this mean the every page on the site must be purged from the
caches to cause all requests to see a new version number?

Is there also some pending squid patch to let it jam in a new ID
number on the fly for every request? Or have I misunderstood what this
does?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Minify

2009-06-26 Thread Michael Dale
correct me if I am wrong but thats how we presently update js and css.. 
we have $wgStyleVersion and when that gets updated we send out fresh 
pages with html pointing to js with $wgStyleVersion append.

The difference in the context of the script-loader is we would read the 
version from the mediaWiki js pages that are being included and the 
$wgStyleVersion var. (avoiding the need to shift reload) ... in the 
context of rendering a normal page with dozens of template lookups I 
don't see this a particularly costly. Its a few extra getLatestRevID 
title calls. Likewise we should do this for images so we can send the 
cache forever header (bug 17577) avoiding a bunch of 304 requests.

One part I am not completely clear on is how we avoid lots of 
simultaneous requests to the scriptLoader when it first generates the 
JavaScript to be cached on the squids, but other stuff must be throttled 
too no? Like when we update any code, language msgs, or local-settings 
does that does not result in the immediate purging all of wikipedia.

--michael

Gregory Maxwell wrote:
 On Fri, Jun 26, 2009 at 4:33 PM, Michael Dalemd...@wikimedia.org wrote:
   
 I would quickly add that the script-loader / new-upload branch also
 supports minify along with associating unique id's grouping  gziping.

 So all your mediaWiki page includes are tied to their version numbers
 and can be cached forever without 304 requests by the client or _shift_
 reload to get new js.
 

 Hm. Unique ids?

 Does this mean the every page on the site must be purged from the
 caches to cause all requests to see a new version number?

 Is there also some pending squid patch to let it jam in a new ID
 number on the fly for every request? Or have I misunderstood what this
 does?

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
   


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] subst'ing #if parser functions loses line breaks, and other oddities

2009-06-26 Thread Gerard Meijssen
Hoi,
In the past the existence of templates in one wiki has been used as an
argument to not accept an extension. With extensions you have functionality
that is indeed intended to be external to ordinary users but you are talking
about functionality that can be tested. With templates you have stuff that
can adn does severely impact performance and is at the same time not usable
on other systems.

While it may be so that you can not effectively contribute to an article on
something esoteric as the ten commondmanets in Roman Catholocism, it might
be possible for you to translate it in another language if you have the
language skills. With the way templates are I would not touch them with a
barge pole if I can help it. Templates are however the only tool we consider
for things like info boxes and stuff. They are as a result quite important
from a functional point of view. From a usability point of view they are
horrible.

In conclusion, templates are used and they prove to be problematic. The best
proof of this is the recent performance issues we had.
Thanks,
   GerardM

2009/6/26 Gregory Maxwell gmaxw...@gmail.com

 On Fri, Jun 26, 2009 at 12:01 PM, Gerard
 Meijssengerard.meijs...@gmail.com wrote:
  Hoi,
  At some stage Wikipedia was this thing that everybody can edit... I can
 not
  and will not edit this shit so what do you expect from the average Joe ??

 I can not (effectively) contribute to
 http://en.wikipedia.org/wiki/Ten_Commandments_in_Roman_Catholicism

 Does this mean Wikipedia is a failure?

 I don't think so.  Not everyone needs to be able to do everything.
 Thats one reasons projects have communities: Other people can do the
 work which I'm not interested in or not qualified for.  Not everyone
 needs to make templates— and there are some people who'd have nothing
 else to do but add fart jokes to science articles if the site didn't
 have plenty of template mongering that needed doing.

 Unfortunately the existing system is needlessly exclusive. The
 existing parser function uses solution are so byzantine that even many
 people with the right interest and knowledge are significantly put off
 from it.

 The distinction between this and a general easy to use is a very
 critical one.

 It's also the case that the existing system's problems spills past its
 borders due to its own limitations: Regular users need to deal with
 things like weird whitespace handling and templates which MUST be
 substed (or can't be substed; at random from the user's perspective).
 This makes the system harder even for the vast majority of people who
 should never need to worry about the internals of the templates.

 I think this is the most important issue, and its one with real
 usability impacts,  but it's not due to the poor syntax. On this
 point, the template language could be intercal but still leave most
 users completely free to ignore the messy insides. The existing system
 doesn't because there is no clear boundary between the page and the
 templates (among other reasons, like the limitations of the existing
 'string' manipulation functions).

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Minify

2009-06-26 Thread Michael Dale
Aryeh Gregor wrote:
 Any given image is not included on every single page on the wiki.
 Purging a few thousand pages from Squid on an image reupload (should
 be rare for such a heavily-used image) is okay.  Purging every single
 page on the wiki is not.

   
yea .. we are just talking about adding image.jpg?image_revision_id  to 
all the image src at page render time should never purge everything on 
the wiki ;)
 No.  We don't purge Squid on these events, we just let people see old
 copies.  Of course, this doesn't normally apply to registered users
 (who usually [always?] get Squid misses), or to pages that aren't
 cached (edit, history, . . .).
   
oky thats basically what I understood. That makes sense.. although it 
would be nice to think about a job or process that purges pages with 
outdated language msg, or pages that are referencing outdated scripts, 
style-sheet, or image urls.

We ~do~ add jobs to purge for template updates. Are other things like 
language msg  code updates candidates for job purge tasks? ... I guess 
its not too big a deal to get an old page until someone updates it.

--michael

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Minify

2009-06-26 Thread Andrew Dunbar
2009/6/26 Robert Rohde raro...@gmail.com:
 I'm going to mention this here, because it might be of interest on the
 Wikimedia cluster (or it might not).

 Last night I deposited Extension:Minify which is essentially a
 lightweight wrapper for the YUI CSS compressor and JSMin JavaScript
 compressor.  If installed it automatically captures all content
 exported through action=raw and precompresses it by removing comments,
 formatting, and other human readable elements.  All of the helpful
 elements still remain on the Mediawiki: pages, but they just don't get
 sent to users.

 Currently each page served to anons references 6 CSS/JS pages
 dynamically prepared by Mediawiki, of which 4 would be needed in the
 most common situation of viewing content online (i.e. assuming
 media=print and media=handheld are not downloaded in the typical
 case).

 These 4 pages, Mediawiki:Common.css, Mediawiki:Monobook.css, gen=css,
 and gen=js comprise about 60 kB on the English Wikipedia.  (I'm using
 enwiki as a benchmark, but Commons and dewiki also have similar
 numbers to those discussed below.)

 After gzip compression, which I assume is available on most HTTP
 transactions these days, they total 17039 bytes.  The comparable
 numbers if Minify is applied are 35 kB raw and 9980 after gzip, for a
 savings of 7 kB or about 40% of the total file size.

 Now in practical terms 7 kB could shave ~1.5s off a 36 kbps dialup
 connection.  Or given Erik Zachte's observation that action=raw is
 called 500 million times per day, and assuming up to 7 kB / 4 savings
 per call, could shave up to 900 GB off of Wikimedia's daily traffic.
 (In practice, it would probably be somewhat less.  900 GB seems to be
 slightly under 2% of Wikimedia's total daily traffic if I am reading
 the charts correctly.)


 Anyway, that's the use case (such as it is): slightly faster initial
 downloads and a small but probably measurable impact on total
 bandwidth.  The trade-off of course being that users receive CSS and
 JS pages from action=raw that are largely unreadable.  The extension
 exists if Wikimedia is interested, though to be honest I primarily
 created it for use with my own more tightly bandwidth constrained
 sites.

This sounds great but I have a problem with making action=raw return
something that is not raw. For MediaWiki I think it would be better to
add a new action=minify

What would the pluses and minuses of that be?

Andrew Dunbar (hippietrail)


 -Robert Rohde

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
http://wiktionarydev.leuksman.com http://linguaphile.sf.net

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Minify

2009-06-26 Thread Sergey Chernyshev
It probably depend on how getTimestamp() is implemented for non-local repos.
Important thing is not to have it return new values too often and return
real version of the image.

If this is already the case, can someone apply this patch then - don't want
to be responsible for such an important change ;)

Sergey


On Fri, Jun 26, 2009 at 3:52 PM, Chad innocentkil...@gmail.com wrote:

 You're patching already-existing functionality at the File level, so it
 should be ok to just plop it in there. I'm not sure how this will affect
 the ForeignApi interface, so it'd be worth testing there too.

 From what I can tell at a (very) quick glance, it shouldn't adversely
 affect anything from a client perspective on the API, as we just
 rely on whatever URL was provided to us to begin with.

 -Chad

 On Fri, Jun 26, 2009 at 3:31 PM, Sergey
 Chernyshevsergey.chernys...@gmail.com wrote:
  Which of all those file to change to apply my patch only to files in
 default
  repository? Currently my patch is applied to File.php
 
  http://bug-attachment.wikimedia.org/attachment.cgi?id=5833
 
  If you just point me into right direction, I'll update the patch and
 upload
  it myself.
 
  Thank you,
 
 Sergey
 
 
  --
  Sergey Chernyshev
  http://www.sergeychernyshev.com/
 
 
  On Fri, Jun 26, 2009 at 3:17 PM, Chad innocentkil...@gmail.com wrote:
 
  The structure is LocalRepo extends FSRepo extends
  FileRepo. ForeignApiRepo extends FileRepo directly, and
  ForeignDbRepo extends LocalRepo.
 
  -Chad
 
  On Jun 26, 2009 3:15 PM, Sergey Chernyshev 
 sergey.chernys...@gmail.com
  wrote:
 
  It's probably worth mentioning that this bug is still open:
  https://bugzilla.wikimedia.org/show_bug.cgi?id=17577
 
  This will save not only traffic on subsequent page views (in this case:
 
 
 http://www.webpagetest.org/result/090218_132826127ab7f254499631e3e688b24b/1/details/cached/it'shttp://www.webpagetest.org/result/090218_132826127ab7f254499631e3e688b24b/1/details/cached/it%27s
 
 http://www.webpagetest.org/result/090218_132826127ab7f254499631e3e688b24b/1/details/cached/it%27s
 
  about 50K), but also improve performance dramatically.
 
  I wonder if anything can be done to at least make it work for local
 files -
  I have hard time understanding File vs. LocalFile vs. FSRepo
 relationships
  to enable this just for local file system.
 
  It's probably also wise to figure out a way for it to be implemented on
  non-local repositories too so Wikimedia projects can use it, but I'm
  completely out of the league here ;)
 
  Thank you,
 
Sergey
 
 
  --
  Sergey Chernyshev
  http://www.sergeychernyshev.com/
 
  On Fri, Jun 26, 2009 at 11:42 AM, Robert Rohde raro...@gmail.com
 wrote:
  
  I'm going to mention ...
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Current events-related overloads

2009-06-26 Thread Domas Mituzas

 This is a very good idea, and sounds much better than having those

the major problem with all dirty caching is that we have more than one  
caching layer, and of course, things abort.

the fact, that people should be shown dirty versions instead of proper  
article leads to situation where in case of vandal fighting, etc,  
people will see stale versions, instead of waiting few seconds and  
getting real one.

In theory, update flow could look like this:

1. Set I'm working on this in a parallelism coordinator or lock  
manager
2. Do all database transactions  commit
3. Parse
4. Set memcached object
5. Invalidate squid objects

Now, should we parse, block or serve stale, could be dynamic, e.g. if  
we detect more than x parallel parses we fall back to blocking for few  
seconds, once we detect more than y of blocked threads on the task, or  
block expires and there's no fresh content yet (or there's new  
copy.. ) - then stale stuff can be served.
In perfect world that asks for specialized software :)

Do note, for past quite a few years we did lots and lots of work to  
avoid stale content being served. I would not see dirty serving as  
something we should be proud of ;-)

Domas

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l