Re: [Wikitech-l] Collaborative machine translation for Wikipedia -- proposed strategy

C. Scott Ananian Fri, 26 Jul 2013 20:30:41 -0700

On Fri, Jul 26, 2013 at 3:25 PM, David Cuenca <[email protected]> wrote:


> This is the preliminary draft:
>
> https://meta.wikimedia.org/wiki/Collaborative_Machine_Translation_for_Wikipedia


The linked page says:

> For this kind of project it is prefered to use a rule-based machine
> translation<https://en.wikipedia.org/wiki/en:Rule-based_machine_translation> 
> system,
> because total control is wanted over the whole process and minority
> languages should be accounted for (not that easy with 
> statistical-based<https://en.wikipedia.org/wiki/en:Statistical_machine_translation>
>  MT,
> where parallel corpora may be non-existing).


This statement seems rather defeatist to me.  Step one of a machine
translation effort should be to provide tools to annotate parallel texts in
the various wikis, and to edit and maintain their parallelism.  Once this
is done, you have a substantial parallel corpora, which is then suitable to
grow the set of translated articles.  That is, minority languages ought to
be accounted for by progressively expanding the number of translated
articles in their encyclopedia, as we do now.  As this is done, machine
translation incrementally improves.  If there is not enough of an editor
community to translate articles, I don't see how you will succeed in the
much more technically-demanding tasks of creating rules for a rule-based
translation system.  The beauty of the statistical approach is that little
special ability is needed.
  --scott

-- 
(http://cscott.net)
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Collaborative machine translation for Wikipedia -- proposed strategy

Reply via email to