Re: [Wikitech-l] Do we have any data in wikidata / wiktionary that could be used for mechanic translations?

Lars Aronsson Thu, 22 May 2014 09:04:10 -0700

On 05/22/2014 05:41 PM, Petr Bena wrote:

I was looking for a free (possibly open source) provider of automatic
translations for my open source application I am working on and quite
had troubles finding some. Then I realized we have a project called
"wiktionary" which could possibly (I was assuming it's open
dictionary) help me here, but I was quite disappointed as I couldn't
find any simple way to perform simple queries like:


There are several open-source machine translation projects.
They are either rule-based or statistics-based. One of the
rule-based projects is Apertium.

When you start from zero, building a rule-based system
gives you a useful system quite fast, especially if the
two languages are similar. A statistics-based system (such
as Google Translate) requires enormous amounts of
data to become useful.

It's not something that you can start as a subproject
within Wiktionary, not even as a separate WMF project.
It's a very large task.

One naive approach is to base a statistics-based
machine translator (SMT) on the European Union's
freely available parallel text corpus. When you try
to translate Finnish "terve" (which means: hello!)
into English in such a system, it will say "health",
since the same word also means health, and EU
texts only talk about healthcare, never "hello".


--
  Lars Aronsson ([email protected])
  Aronsson Datateknik - http://aronsson.se



_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Do we have any data in wikidata / wiktionary that could be used for mechanic translations?

Reply via email to