I have not seen a comprehensive overview of MediaWiki localisation  discussed 
on the lists I am posting this message to, so I thought I might give it a try. 
All statistics are based on MediaWiki 1.12 alpha, SVN version r29106.

==Introduction==
*Localisation or L10n - the process of adapting the software to be as familiar 
as possible to a specific locale (in scope)
*Internationalisation or i18n - the process of ensuring that an application is 
capable of adapting to local requirements (out of scope)

MediaWiki has a user interface (UI) definition for 319 languages. Of those 
languages at least 17 language codes are duplicates and/or serve a purpose for 
usability[1]. Reporting on them, however, is not relevant. So MediaWiki in its 
current state supports 302 languages. To be able to generate statistics on 
localisation, a MessagesXx.php file should be present in languages/messages. 
There currently are 262 such files, of which 16 are redirects from the 
duplicates/usability group[2]. So MediaWiki has an active in-product 
localisation for 236 languages. 66 languages have an interface, but simply fall 
back to English.

The MediaWiki core product recognises several collections of localisable 
content (three of which are defined in messageTypes.inc):
* 'normal' messages that can be localised (1726)
* optional messages that can be localised, which usually only happens for 
languages not using a Latin script (161)
* ignored messages that should not be localised (100)
* namespace names and namespace aliases (17)
* skin names (7)
* magic words (120)
* special page names (76)
* other (directionality, date formats, separators, book store lists, link 
trail, and others)

Localisation of MediaWiki revolves around all of the above. Reporting is done 
on the normal messages only.

MediaWiki is more than just the core product. On 
http://www.mediawiki.org/wiki/Category:All_extensions some 750 extensions have 
some kind of documentation. This analysis will scope only to the code currently 
present in svn.wikimedia.org/svnroot/mediawiki/trunk. The source code 
repository contains give or take 230 extensions. Of those 230 extensions, about 
140 contain messages that can be visible in the UI in some use case (debugging 
excluded). Out of those 140, about 10 extensions have an exotic implementation 
for localisation localisation support at all (just English text in the code). 
10 extensions appear to be outdated. I have seen about 5 different 'standard' 
implementations of i18n in extensions. Since MediaWiki 1.11 there is 
wfLoadExtensionMessages. Not that many extensions use this yet for message 
handling. If you can help add more standard i18n support for extensions (an 
overview can be found at http://translatewiki.net/wiki/User:Siebrand/tobeadded) 
or help in standardising L10n for extensions, please do not hesitate.

==MediaWiki localisation in practice==
Localisation of MediaWiki is currently done in the following ways I am aware of:
* in local wikis: Sysops on local wikis shape and translate messages to fit 
their needs. This is being done in wikis that are part of Wikimedia, Wikia, 
Wikitravel, corporate wikis, etc. This type of localisation has the fewest 
benefits for the core product and extensions because it happens completely out 
of the scope of svn committers. I have heard Wikia supports languages that are 
not supported in the svn version. I would like to get some help in identifying 
and contacting these communities to try and get their localisations in the core 
product. Together with SPQRobin, I am trying to get what has been localised in 
local Wikipedias into the core product and recruit users that worked on the 
localisation to work on a more centralised way of localisation (see Betawiki)
* through bugzilla/svn: A user of MediaWiki submits patches for core messages 
and/or extensions. These users are mostly part of a wiki community that is part 
of Wikimedia. These are usually taken care of by committers raymond, rotemliss, 
and sometimes others). Some users maintain a language directly on SVN. At the 
moment, 10-15 languages are maintained this way: Danish, German, Persian, 
Hebrew, Indonesian, Kazach (3 scripts), Chinese (3 variants), and some more 
less frequently.
* through Betawiki: Betawiki was founded in mid 2005 by Niklas Laxström. In the 
years to follow, Betawiki has grown to be a MediaWiki localisation community 
with over 200 users that has contributed to the localisation of 120 languages 
each month in the past few months. Users that are only familiar with MediaWiki 
as a tool can localise almost every aspect of MediaWiki (except for the group 
'other' mentioned earlier) in a wiki interface. The work of the translators is 
regularly committed to svn by nikerabbit, and myself. Betawiki also offers a 
.po export that enables users to use more advanced translation tools to make 
their translation. This option was added recently and no translations in this 
format have been sumitted yet. Betawiki also supports translation of 122 
extensions, aiming to support everything that can be supported.

==MediaWiki localisation statistics==
MediaWiki localisation statistics have been around since June 2005 at 
http://www.mediawiki.org/wiki/Localisation_statistics[3]. Traditionally reports 
have focused on the complete set of core messages. Recently a small study was 
done after usage of messages, which resulted in a set of almost 500 'most often 
used messages in MediaWiki', based on usage of messages on the cluster of 
Wikimedia 
(http://translatewiki.net/wiki/Most_often_used_messages_in_MediaWiki). 

Up to recently there were no statistics available on the localisation of 
extensions. Through groupStatistics.php in the extension Translate, these 
statistics can now be created. Aside from reporting on 'most often used 
MediaWiki messages', 'MediaWiki messages', and 'all extension messages 
supported by extension Translate' (short: extension messages). Additionally, a 
meta extension group of 34 extensions used in the projects of Wikimedia has 
been created (short: Wikimedia messages). A regularly updated table of these 
statistics can be found at 
http://translatewiki.net/wiki/Translating:Group_statistics.

Some (arbitrary) milestones have been set for the four above mentioned 
collections of messages. For the usability of MediaWiki in a particular 
language, the group 'core-mostused' is the most important. A language must 
qualify for MediaWiki to have minimal support for that language. Reaching the 
milestones for the first two groups is something the Wikimedia language 
committee considers to use as a requirement for new Wikimedia wikis:
* core-mostused (496 messages): 98%
* wikimedia extensions (354 messages): 90%
* core (1726 messages): 90%
* extensions (1785 messages): 65%

Currently the following numbers of languages have passed the above milestones:
* core-mostused: 47 (15,5% of supported languages)
* wikimedia extensions: 10 (3,3% of supported languages)
* core: 49 (16,2% of supported languages)
* extensions: 7 (2,3% of supported languages)

==Conclusion==
So... Are we doing well on localisation or do we suck? My personal opinion is 
that we do something in between. Observing that there are some 250 Wikipedias 
that all use the Wikimedia Commons media repository, and that only 47 languages 
have a minimal localisation, we could do better. With Single User Login around 
the corner (isn't it), we must do better. On the other hand, new language 
projects within Wikimedia all have excellent localisation of the core product. 
These languages include Asturian, Bikol Central, Lower Sorbian, Extremaduran, 
and Galician. But where is Hindi, for example, with currently only 7% of core 
messages translated?

With the Wikimedia Foundation aiming to put MediaWiki to good use in developing 
countries and products like NGO-in-a-box that include MediaWiki, the potential 
of MediaWiki as a tool in creating and preserving knowledge in the languages of 
the world is huge. We have to tap into that potential and *you* (yes, I am glad 
you read this far and are now reading my appeal) can help. If you know people 
that are proficient in a language and like contributing to localisation, please 
point them in the right direction. If you know of organisations that can help 
localising MediaWiki: please approach them and ask them to help.

We have all the tools now to successfully localise MediaWiki into any of the 
7000 or so languages that have been classified in ISO 639-3. We only need one 
person per language to make it happen. Reaching the first two milestones 
(core-mostused and wikimedia extensions) takes about 16 hours of work. Using 
Betawiki or the .po, little to no technical knowledge is required.

This was the pitch. How about we aim to at least double the numbers by the end 
of 2008 to:
* core-mostused: 120
* wikimedia extensions: 50
* core: 90
* extensions: 20

I would like to wish everyone involved in any aspect of MediaWiki a wonderful 
2008.

Cheers!

Siebrand Mazeland

[1] 
als,crh,iu,kk,kk-cn,kk-kz,kk-tr,ku,sr,sr-jc,sr-jl,zh,zh-cn,zh-sg,zh-hk,zh-min-nan,zh-yue
[2] 
crh,iu,kk,kk-cn,kk-kz,kk-tr,ku,sr,sr-jc,sr-jl,zh,zh-cn,zh-sg,zh-hk,zh-min-nan,zh-yue
[3] older locations are 
http://www.mediawiki.org/wiki/Localisation_statistics/stats and
    http://meta.wikimedia.org/wiki/Localization_statistics


_______________________________________________
Translators-l mailing list
[email protected]
http://lists.wikimedia.org/mailman/listinfo/translators-l

Reply via email to