Re: [Wikitech-l] JS2 design (was Re: Working towards branching MediaWiki 1.16)

Michael Dale Thu, 24 Sep 2009 12:24:47 -0700

~some comments inline~

Tim Starling wrote:


[snip]
> I started off working on fixing the coding style and the most glaring
> errors from the JS2 branch, but I soon decided that I shouldn't be
> putting so much effort into that when a lot of the code would have to
> be deleted or rewritten from scratch.
>   

I agree there are some core components that should be separated out and 
re-factored. And some core pieces that your probably focused on do need 
to be removed & rewritten as they are aged quite a bit. (parts of 
mv_embed.js where created in SOC 06) ... I did not focus on the ~best~ 
core loader that could have been created I have just built on what I 
already had available that has "worked" reasonably well for the 
application set that I was targeting. Its been an iterative process 
which I feel is moving in the right direction as I will outline below.

Obviously more input is helpful and I am open to implementing most of 
the changes you describe as they make sense. But exclusion and dismissal 
may not be less helpful... unless that is your targeted end in which 
case just say so ;)

Its normal for 3rd party observer to say the whole system should be 
scraped and rewritten. Of course starting from scratch is much easier to 
design an ideal system and what it should/could be.

> I did a survey of script loaders in other applications, to get an idea
> of what features would be desirable. My observations came down to the
> following:
>
> * The namespacing in Google's jsapi is very nice, with everything
> being a member of a global "google" object. We would do well to
> emulate it, but migrating all JS to such a scheme is beyond the scope
> of the current project.
>   

You somewhat contradict this approach by recommending against "class" 
abstraction below.. ie how will you cleanly load components and 
dependencies if not by a given name?

I agree we should move things into a global object ie: $j and all our 
components / features should extend that object. (like jquery plugins). 
That is the direction we are already going.

Dependency loading is not really beyond the scope... we are already 
supporting that. If you check out the mv_jqueryBindings function in 
mv_embed.js ... here we have loader calls integrated into the jquery 
binding. This integrates loading the high level application interfaces 
into their interface call.

The idea is to move more and more of the structure of the application 
into that system. so right now mwLoad is a global function but should be 
re-factored into the jquery space and be called via $j.load();  |
|
> * You need to deal with CSS as well as JS. All the script loaders I
> looked at did that, except ours. We have a lot of CSS objects that
> need concatenation, and possibly minification.
>   

Brion did not set that as high priority when I inquired about it, but of 
course we should add in style grouping as well. It's not like I said we 
should exclude that in our script-loader just a matter of setting 
priority which I agree is high priority.
> * JS loading can be deferred until near the </body> or until the
> DOMContentLoaded event. This means that empty-cache requests will
> render faster. Wordpress places emphasis on this.
>   

true. I agree that we should put the script includes at the bottom. Also 
all non-core js2 scripts is already loaded via DOMContentLoaded ready 
event. Ideally we should only provide "loaders" and maybe some small bit 
of configuration for the client side applications they provide. As 
briefly described here: 
http://www.mediawiki.org/wiki/JS2_Overview#How_to_structure_your_JavaScript_application
> * Dependency tracking is useful. The idea is to request a given
> module, and all dependencies of that module, such as other scripts,
> will automatically be loaded first.
>   

As mentioned above we do some dependency tracking via binding jquery 
helpers that do that setup internally on a per application interface level.
We could add that convention directly into the script-loader function if 
desired so that on a per class level we include dependencies. Like 
mwLoad('ui.dialog') would know to load ui.core etc.
>
>
> I then looked more closely at the current state of script loading in
> MediaWiki. I made the following observations:
>
> * Most linked objects (styles and scripts) on a typical page view come
> from the Skin. If the goal is performance enhancement, then working on
> the skins and OutputPage has to be a priority.
>   

agreed. The script-loading was more urgent for my application task set. 
But for the common case of per page view performance css grouping has 
bigger wins.
> * The "class" abstraction as implemented in JS2 has very little value
> to PHP callers. It's just as easy to use filenames. 
The idea with "class" abstraction is that you don't know what script set 
you have available at any given time. Maybe one script included 
ui.resizable and ui.move and now your script depends on  ui.resizable 
and ui.move and ui.drag... your loader call will only include ui.drag 
(since the other are already defined).

This avoids re-parse and re-including the same javascript file as part 
of a separate group request or src include. Alternatively you can check 
against including the same script when your just using raw src but a bit 
trickery when using scriptloader and call define checks and class/file 
convention is compatible with XHR getting a javascript file and 
evaluating the result. (which is the way some frameworks include 
javascript that to ensure a consistent onLoaded callback)...

Which brings us to another point about class / file bindings. It lets us 
test the typeof the variable it should define and then issue a callback 
once we definitely know that this script is loaded and ready.

The trade-off for grouping distinct class set requests is chacheability 
for return visit vs script reuse vs fastest display time for un-cached 
visit vs server resource cost. Also perhaps some scripts can always 
grouped while other components are rarely included individually.  But 
that changes with application development. Combining scripts is not too 
costly relative to the round trip time... and we could pre-minify.

Its "optimal" to avoid the script-loader all together and just have a 
single small core updated file with short expire  that sets the version 
number of each script. Then everything else could have a high expire 
since its tagged by version number. That would be "optimal" but a slower 
first load experience. And we still have to cache and package 
localizations per language.

I have not done a definitive evaluation of the trade offs and am open to 
more thoughts on that front.

> It could be made
> more useful with features such as dependency tracking, better
> concatenation and CSS support. But it seems to me that the most useful
> abstraction for PHP code would be for client-side modules to be
> multi-file, potentially with supporting PHP code for each module.
>   
We want to move away from php code dependencies for each javascript 
module. Javascript should just directly hit a single exposure point of 
the mediawiki api. If we have php code generating bits and pieces of 
javascript everywhere it quickly gets complicated, is difficult to 
maintain, much more resource intensive, and requires a whole new 
framework to work right.

Php's integration with the javascript should be minimal. php should 
supply configuration, and package in localized msgs.

> * Central registration of all client-side resources in a global
> variable would be onerous and should be avoided.
>   

You can always add to the registered global. This works well by having 
the php read the javascript file directly to ascertain the global list. 
That way your javascript works stand alone as well as integrated with a 
script-loader that provides localization and configuration.

> * Dynamic requests such as [[MediaWiki:Handheld.css]] have a large
> impact on site performance and need to be optimised. I'm planning a
> new interface, similar to action=raw, allowing these objects to be
> concatenated.
>   

Sounds good ;) The present script-loader does this for javacript and 
take the most recent revision number of the included pages and the 
grouped version to that. I think it has to be integrated into page 
output so you can have a long expire time.
>
>
> The following design documents are in my user space on mediawiki.org:
>
> <http://www.mediawiki.org/wiki/User:Tim_Starling/CSS_and_JS_caller_survey_(r56220)>
>   - A survey of MW functions that add CSS and JS, especially the
> terribly confusing situation in Skin and OutputPage
>   
I did a small commit r56746 to try and start to clean that up... but it 
is a mess.
> <http://www.mediawiki.org/wiki/User:Tim_Starling/JS_load_order_issues_(r56220)>
>   - A breakdown of JS files by the issues that might be had in moving
> them to the footer or DOMContentLoaded. I favour a conservative
> approach, with wikibits.js and the site and user JS staying in the
> <head>.
>   
A sperate somewhat related effort should be to depreciate all non-jquery 
style helpers. A lot of the functions in wikibits.js for example could 
use jquery functions or be re-factored into a few lines of jquery which 
may make it unnessesary to have thouse global function abstractions to 
begin with. I am in-favor of moving things to the bottom of the page. 
Likewise all new javascript should be compatible with being run at 
DOMContentLoaded time.

> <http://www.mediawiki.org/wiki/User:Tim_Starling/Proposed_modularisation_of_client-side_resources>
>   - A proposed reorganisation of core scripts (Skin and OutputPage)
> according to the MW modules they are most associated with.
>
>
>
> The object model I'm leaning towards on the PHP side is:
>
> * A client-side resource manager (CSRM) class. This would be
> responsible for maintaining a list of client-side resources that have
> been requested and need to be sent to the skin. It would also handle
> caching, distribution of incoming dynamic requests, dependencies,
> minification, etc. This is quite a complex job and might need to be
> split up somewhat.
>   
That sounds cleaner than the present outputPage and Skin.php and 
associated script-loader grafting. Having a cleaner system would be 
nice... but will probably break skins and other stuff... or have 
OutputPage and Skin old api mappings or change almost every extension 
and break every 3rd party skin out there?

You could probably have something "working" fairly quickly the trick is 
compatibility with the broken old system.  It is a core issue and people 
working on other projects have added on the functionality needed to "get 
it working" with existing stuff ... If you want to clean it up I don't 
think anyone will protest as long as it does not take away features or 
require major reworking of other code.
> * A hierarchy of client-side module classes. A module object would
> contain a list of files, dependencies and concatenation hints. Objects
> would be instantiated by parent classes such as skins and special
> pages, and added to the CSRM. Classes could be registered globally,
> and then used to generate dynamic CSS and JS, such as the user
> preference stylesheet.
>   
The main problem of defining all the objects and hierarchy relationships 
in php is that it won't work stand alone. An ideal system retains 
flexibility in being able to work with the script loader or without it. 
Ultimately your javascript code will dictate what class is required when 
and where. If you have to go back to php to define this all the time 
that won't be fun.

Additionally how do you describe call chains that happen purely in JS.  
Say you do a search to insert an image then you decide you want to look 
for video now we load a video clip. The serer can't map out that the 
client needs native handler to be packaged with the javascript instead 
of the cortado video handler. We have to run the detection client side 
then get the code. The server could know that if you request cortado 
handler you also need the parent video object, but it seems cleaner to 
map out that dependency in javascript instead of php side. Then say now 
you want to run the code without the script-loader it won't work at all.

> * The module base class would be non-abstract and featureful, with a
> constructor that accepts an array-based description. This allows
> simple creation of modules by classes with no interest in dynamic
> script generation.
>   
What are you planning on including in this array beside the path to the 
javascript file? Again it will suck for the javascript author to go back 
into php and define all the dependencies instead of just listing them as 
needed in the js. Furthermore how will this work with  scripts in the 
mediaWiki namespace. How will they define classes and decencies they 
need if not in the javascript?

I think the php should read the javascript for this information as its 
presently done with the script loader.
> * A new script loader entry point would provide an interface to
> registered modules.
>   
The scirptloader is already defined as part of the javascript Loader so 
the name of the entry point does not matter so much as the calling 
conventions.
>
>
> There are some design decisions I still have to make, which are tricky
> due to performance tradeoffs:
>
> * With concatenation, there is the question of which files to combine
> and which to leave separate. I would like to have a "combine"
> parameter which is a string, and files with the same combine parameter
> will be combined.
>   
right... see discussion above. I think in practice ad-hock grouping via 
post page load javascript interface requests will naturally group and 
cache together common requests by nature of consistent javascript 
application flow. So I don't think the concatenation "hit" will be that 
substantial. Javascript grouped at the page-loading level will of course 
want to try and avoid grouping something that will later be included 
by-its-self a separate page.
> * Like Wordpress, we could store minified and concatenated files in a
> public cache and then link to that cache directly in the HTML.
>   
That seems perfectly reasonable... Is the idea that this will help small 
sites that don't have things behind a squid proxy? Although small sites 
seem to work oky with mediaWiki pages being served via php reading 
cached files.
> * The cache invalidation scheme is tricky, there's not really an ideal
> system. A combination of cache-breaking parameters (like Michael's
> design) and short expiry times is probably the way to go. Using
> cache-breaking parameters alone doesn't work because there is
> referring HTML cached on both the server and client side, and
> regenerating that HTML periodically would be much more expensive than
> regenerating the scripts.
>   
An option is to write out a bit of dynamic javascript to a single low 
expire static cached core script that sets the versions for everything 
that could be included. But that does not work well with live hacks. 
(hence the checking of filemodified date) ... If version updates are 
generally highly correlated with localization updates anyway... I don't 
see too much problem with old javascript persisting until a page is 
purged and rendered with the new interface.

I don't see benefit in hurting our cache rate to support ~new 
javascript~ with ~old html~

New javascript could depend on new html no? (like an added configuration 
variable)? or new div element? You could add that level of complexity to 
the CSRM concept ... or just tie javascript to a given html page. (This 
reuses the cached javascript if the javascript has not been updated... 
at the cost of re-rendering the html as is done with other updates.


> Here are my notes:
>
> * Concatenation
>   * Performance problems:
>     * Changing inclusions. When inclusions change, whole contents has
> to be sent again.
>       * BUT people don't change skins very often.
>       * So combine=all=skin should save time for most
>     * Expiry times have to be synchronised. Take the minimum expiry of
> all, and force freshness check for all.
>     * Makes the task of squid cache purging more difficult
>     * Defeats browser concurrency
>
>   * Performance advantages:
>     * For dynamic requests:
>       * Avoids MW startup time.
>       * Avoids DoSing small servers with concurrent requests.
>     * For all requests:
>       * Reduces squid CPU
>       * Removes a few RTTs for non-pipelining clients
>       * Improves gzip compression ratio
>
> * Combine to static file idea:
>   * Pros:
>     * Fast to stream out, on all systems
>     * Doesn't break HughesNet
>   * Cons:
>     * Requires splitting the request into static and dynamic
>     * Need webserver config to add Expires header and gzip
>   

We could support both if we build the logic into the js as done with the 
present system. The present scirpt-loader works both by feeding the 
loader info from the javascript files.  (although does not send the 
client to cached group requests if the script-loader is off). But a 
simple addition of a maintenance script could output the combined 
scripts sets into a public dir based on loader set defections from the js.

> With some help from Splarka, I've determined that it would be possible
> to merge the requests for [[MediaWiki:Common.css]],
> [[MediaWiki:Skinname.css]], [[MediaWiki:Handheld.css]] and
> [[MediaWiki:Print.css]], using @media blocks for the last two, for a
> significant performance win in almost all cases.
>   
sounds good.
>
>
> Once the architectural issues have been fixed, the stylistic issues in
> both ancient JS and the merged code will have to be dealt with, for
> example:
>
> * Poorly-named functions, classes, files, etc. There's a need for
> proper namespacing and consistency in naming style.
>   
Yea there is a bit of identity crisis based on the inherited code. But 
variable renaming is not too hard. Also there is transitioning in under 
way to go from old style to more jQuery style.
> * Poorly-written comments
>   
True. (no defense there)  (expect to say that i am dyslexic)
> * Unnecessary use of the global namespace. The jQuery style is nice,
> with local functions inside an anonymous closure:
>
> function () {
>    function setup() {
>        ...
>    }
>    addOnloadHook( setup );
> }();
>   
right as mentioned above I am moving in that direction see: 
mv_jqueryBindings(); even read the comment right above:
 * @@ eventually we should refactor mwCode over to jQuery style plugins
 *      and mv_embed.js will just handle dependency mapping and loading.
> * Unsafe construction of HTML. This is ubiquitous in the mwEmbed
> directory and there will be a huge potential for XSS, as soon as user
> input is added. HTML construction with innerHTML can be replaced by
> document.createElement() or its jQuery equivalent.
>   

I build a lot of html as static strings because its faster than 
generating every element with function calls. If you can inject 
arbitrary content into some javscript string then I imagine you can do 
so with the createElement as well. You don't gain much escaping already 
defined javascript. If you do something to get some value into some one 
elses JavaScript instance then you might as well call your evilJs 
directly. Perhaps I am understanding this wrong? Could you illustrate 
how that would be exploited in one case but not the other?

> * The identity crisis. The whole js2 concept encourages code which is
> poorly integrated with the rest of MediaWiki, and which is written
> without proper study of the existing code or thought to refactoring.
> It's like SkinTemplate except with a more pretentious name. I'd like
> to get rid of all instances of "js2", to move its scripts into other
> directories, and to remove the global variables which turn it on and
> off. Also the references to MetavidWiki and the mv prefixes should be
> fixed.
>   

Yes being "stand alone" is a primary "feature" of the concept ... The 
whole mwEmbed system can "stand alone" that will enable us to easily 
share interface components with other CMS or platforms.  This enables us 
share things like the add-media-wizard with a blogs that wants to insert 
a asset from commons or a set of free licensed repositories. It enables 
3rd parties to remote embed video clips and do mash-ups with the timed 
text and mediaWIki api calls. Or just use the firefogg encoder as a 
stand alone application. and or use any edit tools we integrate for 
image / audio / video manipulation.

You can compare it to the Google api thing you mentioned early on... its 
very convenient to do a single load call and get everything you need 
from the google application interfaces. The api is one level of 
supporting external integrations. An application level interfaces for 
external applications is another level that holds interesting 
possibilities in my mind. But is a fundamentally new direction for 
mediaWiki.

> * Lack of modularisation. The proposed registration system makes it
> possible to have extensions which are almost entirely client-side
> code. A module like libClipEdit could be moved to its own extension. I
> see no problem with extensions depending on other extensions, the SMW
> extensions do this with no problems.
>   

I am not entirely against extension based modularization and we 
definitely need to support it for extensions that depend on php code.

But its nice to be able to pull any part of the application form any 
point. For example in the add-media-wizard for the description of assets 
I will want to pull the wikiEditor to support formating in the 
description of the imported asset. It sucks to have to check if a 
component is available all the time.

Imagine the sequencer that depends on pretty much everything in the 
mwEmebed directory. For it to resolve all its dependencies across 1/2 
dozen extensions and "versions of extensions" in different locations 
will not be fun.

And of-course will have to build a separate packaging system for the 
application to work as a stand alone tool.

Making it near impossible to test any component stand alone since it 
will be dependent on the mediaWiki framework to get up and running. 
Testing components stand alone has been very valuable.

A single client side code repository can help ensures consistency of 
included modules. Ie we won't have multiple versions of jquery, jquery 
ui, or any other reusable component that is used across multiple 
interfaces and conflicting in our loading system. (presently we have a 
lot of copies of jquery and its plugins in extensions for example)

If this is the ultimate blocker in your mind I could restructure things 
as scatted across extensions. Its not entirety painfully to re-factor 
that way since everything is loaded via js script loader helpers but the 
above mentioned issues would be a bummer.

I prefer if we have a concept of the javascirpt components/folders 
within the mwEmbed folder being "client side modules" as different from 
php code so it does not need to be tied to php code extension. Moving 
directories around won't inherently improve "modularity". Perhaps we 
need a way to just include portions of the javascript set?... We can 
always strip folders in releases. Perhaps it should be moved to a 
separate directory and only parts of it copied over at deployment time?

>
> A few ideas for cool future features also occur to me. Once we have a
> system set up for generating and caching client-side resources, why not:
>
> * Allow the user to choose a colour scheme for their wiki and
> automatically generate stylesheets with the appropriate colours.
>
> * Include images in the system. Use GD to automatically generate and
> cache images with the appropriate anti-aliased background colour.
>
> * Automatically create CSS sprites?
>   

Don't forget about localization packing which was a primary motivation 
for the script-loader to begin with ;)

peace,
--michael

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] JS2 design (was Re: Working towards branching MediaWiki 1.16)

Reply via email to