If you Petr were going to take a rules' based approach to what you've outlined above, and use the already existing Wikidata interlinguality, which I think is based around the 'item with a label' (think a Wikipedia Encyclopedia article - is this correct?), and build on Wiktionary, could one 'reduce' Wikidata's intelinguality from an 'item' to a 'word' (and also co-anticipate voice, smartphones, and extensibility / scalability to all 7,106+ languages, for example, as well)? What else would be needed, and what would some of the initial challenges to beginning this way?
Cheers, Scott (I write the above in the context of developing wiki CC MIT OCW-centric WUaS for free online university degrees, and which plans to be in all 7106+ languages http://worlduniversity.wikia.com/wiki/Languages as schools, and develop a universal translator - http://worlduniversity.wikia.com/wiki/WUaS_Universal_Translator - as well). On Thu, May 22, 2014 at 9:03 AM, Lars Aronsson <[email protected]> wrote: > On 05/22/2014 05:41 PM, Petr Bena wrote: > >> I was looking for a free (possibly open source) provider of automatic >> translations for my open source application I am working on and quite >> had troubles finding some. Then I realized we have a project called >> "wiktionary" which could possibly (I was assuming it's open >> dictionary) help me here, but I was quite disappointed as I couldn't >> find any simple way to perform simple queries like: >> > > There are several open-source machine translation projects. > They are either rule-based or statistics-based. One of the > rule-based projects is Apertium. > > When you start from zero, building a rule-based system > gives you a useful system quite fast, especially if the > two languages are similar. A statistics-based system (such > as Google Translate) requires enormous amounts of > data to become useful. > > It's not something that you can start as a subproject > within Wiktionary, not even as a separate WMF project. > It's a very large task. > > One naive approach is to base a statistics-based > machine translator (SMT) on the European Union's > freely available parallel text corpus. When you try > to translate Finnish "terve" (which means: hello!) > into English in such a system, it will say "health", > since the same word also means health, and EU > texts only talk about healthcare, never "hello". > > > -- > Lars Aronsson ([email protected]) > Aronsson Datateknik - http://aronsson.se > > > > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > -- http://scottmacleod.com/worlduniversityandschool.htm This email is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this email message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is prohibited. If you have received this email in error, please notify the sender and destroy/delete all copies of the transmittal. Thank you. _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
