On Fri, Jul 26, 2013 at 3:25 PM, David Cuenca <[email protected]> wrote:
> This is the preliminary draft: > > https://meta.wikimedia.org/wiki/Collaborative_Machine_Translation_for_Wikipedia The linked page says: > For this kind of project it is prefered to use a rule-based machine > translation<https://en.wikipedia.org/wiki/en:Rule-based_machine_translation> > system, > because total control is wanted over the whole process and minority > languages should be accounted for (not that easy with > statistical-based<https://en.wikipedia.org/wiki/en:Statistical_machine_translation> > MT, > where parallel corpora may be non-existing). This statement seems rather defeatist to me. Step one of a machine translation effort should be to provide tools to annotate parallel texts in the various wikis, and to edit and maintain their parallelism. Once this is done, you have a substantial parallel corpora, which is then suitable to grow the set of translated articles. That is, minority languages ought to be accounted for by progressively expanding the number of translated articles in their encyclopedia, as we do now. As this is done, machine translation incrementally improves. If there is not enough of an editor community to translate articles, I don't see how you will succeed in the much more technically-demanding tasks of creating rules for a rule-based translation system. The beauty of the statistical approach is that little special ability is needed. --scott -- (http://cscott.net) _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
