Re: [Wikitech-l] Do we have any data in wikidata / wiktionary that could be used for mechanic translations?

Scott MacLeod Thu, 22 May 2014 09:32:37 -0700

If you Petr were going to take a rules' based approach to what you've
outlined above, and use the already existing Wikidata interlinguality,
which I think is based around the 'item with a label' (think a Wikipedia
Encyclopedia article - is this correct?), and build on Wiktionary, could
one 'reduce' Wikidata's intelinguality from an 'item' to a 'word' (and also
co-anticipate voice, smartphones, and extensibility / scalability to all
7,106+ languages, for example, as well)? What else would be needed, and
what would some of the initial challenges to beginning this way?

Cheers,
Scott

(I write the above in the context of developing wiki CC MIT OCW-centric
WUaS for free online university degrees, and which plans to be in all 7106+
languages
http://worlduniversity.wikia.com/wiki/Languages as schools, and develop a
universal translator -
http://worlduniversity.wikia.com/wiki/WUaS_Universal_Translator - as well).

On Thu, May 22, 2014 at 9:03 AM, Lars Aronsson <[email protected]> wrote:

> On 05/22/2014 05:41 PM, Petr Bena wrote:
>
>> I was looking for a free (possibly open source) provider of automatic
>> translations for my open source application I am working on and quite
>> had troubles finding some. Then I realized we have a project called
>> "wiktionary" which could possibly (I was assuming it's open
>> dictionary) help me here, but I was quite disappointed as I couldn't
>> find any simple way to perform simple queries like:
>>
>
> There are several open-source machine translation projects.
> They are either rule-based or statistics-based. One of the
> rule-based projects is Apertium.
>
> When you start from zero, building a rule-based system
> gives you a useful system quite fast, especially if the
> two languages are similar. A statistics-based system (such
> as Google Translate) requires enormous amounts of
> data to become useful.
>
> It's not something that you can start as a subproject
> within Wiktionary, not even as a separate WMF project.
> It's a very large task.
>
> One naive approach is to base a statistics-based
> machine translator (SMT) on the European Union's
> freely available parallel text corpus. When you try
> to translate Finnish "terve" (which means: hello!)
> into English in such a system, it will say "health",
> since the same word also means health, and EU
> texts only talk about healthcare, never "hello".
>
>
> --
>   Lars Aronsson ([email protected])
>   Aronsson Datateknik - http://aronsson.se
>
>
>
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

-- 
http://scottmacleod.com/worlduniversityandschool.htm

This email is intended only for the use of the individual or entity to
which it is addressed and may contain information that is privileged and
confidential. If the reader of this email message is not the intended
recipient, you are hereby notified that any dissemination, distribution, or
copying of this communication is prohibited. If you have received this
email in error, please notify the sender and destroy/delete all copies of
the transmittal. Thank you.
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Do we have any data in wikidata / wiktionary that could be used for mechanic translations?

Reply via email to