Alec Conroy wrote:
 > The recent elections showed us that language issues and translation
 > are something we have to take very seriously from now on.  As a first
 > step towards improving communication, it seems like we should get an
 > idea of which users speak which languages?
 >
 > We could directly ask them to tell us, but upon reflection, the
 > information is already hidden in our database.  A multilingual user is
 > one that actively edits two projects of different languages.

Many users already told us, by using babel templates. That also explains
how much confidence do they have in those languages (native level, basic
skills...).


 > In devising a comprehensive translation strategy, we need to know how
 > interconnected any two given projects are.   We also need to know how
 > connected any given project is to English, since it's our working
 > language.

There's also the motivation factor. I am not much of a translator.
Although I have fixed translations that I encountered just when
accessing as a user that had been there for days.
  From what I have seen in the past many translations aren't done by the
skilled people but just by people that was motivated enough to translate
it, which sometimes are in a autotranslation-like level.
However, as the people running the event obviously don't know every
language, they have to rely on the few translating users, and bad texts
pass as 'translated'.


 > We need to pay special attention to languages that are very 'distant'
 > from English-- distant in the sense of having few members who fluent
 > in both English and the language in question.
 >
 > Could someone aid me in getting this data, or explaining why I don't
 > need it or why we already have it, etc?
 >
 > Specifically, I'm looking for:
 > #   For each non-english-language project, how many of their active
 > users are ALSO active on an english-language project? (the answer is
 > should be a single whole number for each project)

First point: define being active. That should be something like 'more
than X non-minor edits in the last Y weeks.'

I see a problem in that you are exposing it as a symmetric relationship,
while I don't think it should be. I could be very skilled to translate
something to my mother tongue, but an inept to translate it in the
opposite way.
Specially when translating between similar languages, where a
non-speaker can easily grasp the meaning.

Also, someone which routinely translates articles for enwiki to xzwiki
would have the exact profile you want to discover, but could be skipped
due to not having enough edits to enwiki.

 > #   For any two projects, how many users are there who are active on
 > both? (answer is a square matrix, roughly 750x750 )
 > #   For any two languages, how many users appear to speak both
 > languages? (answer is a square matrix, roughly 750x750)

I think the answer would actually be three-dimensional, since for each
cell you would have a list of people, the number being just a summary.


 > Does anyone know how to pull this out of the database?    It's an
 > important question for us to recruit translators and really just
 > assess "where we are" in terms of inter-project language capabilities.
 >
 > Alec

I think I can build you something if you give me appropiate values for
the above definition.

Cheers



_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to