Re: [Wikimedia-l] The case for supporting open source machine translation

2013-05-22 Thread Federico Leva (Nemo)
Erik Moeller, 24/04/2013 08:29: [...] Could open source MT be such a strategic investment? I don't know, but I'd like to at least raise the question. I think the alternative will be, for the foreseeable future, to accept that this piece of technology will be proprietary, and to rely on goodwill

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-30 Thread Chris Tophe
2013/4/29 Mathieu Stumpf psychosl...@culture-libre.org Le 2013-04-26 20:27, Milos Rancic a écrit : OmegaWiki is a masterpiece from the perspective of one [computational] linguist. Erik made the structure so well, that it's the best starting point to create a contemporary multilingual

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-29 Thread Nikola Smolenski
On 26/04/13 19:38, Bjoern Hoehrmann wrote: * Andrea Zanni wrote: At the moment, Wikisource could be a interesting corpora and laboratory for improving and enhancing OCR, as the OCR generated text is always proofread and corrected by humans. Try also Distributed Proofreaders. It is my

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-29 Thread Mathieu Stumpf
Le 2013-04-26 17:00, Gerard Meijssen a écrit : Hoi, When we invest in MT it is to convey knowledge, information and primarily Wikipedia articles. They do not have the same problems poetry has. With explanatory articles on a subject there is a web of associated concepts. These concepts are

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-29 Thread Mathieu Stumpf
Le 2013-04-26 19:57, Samuel Klein a écrit : On Fri, Apr 26, 2013 at 1:24 PM, Bjoern Hoehrmann derhoe...@gmx.net wrote: * Erik Moeller wrote: Are there open source MT efforts that are close enough to merit scrutiny? Wiktionary. If you want to help free software efforts in the area of machine

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-29 Thread Mathieu Stumpf
Le 2013-04-26 20:27, Milos Rancic a écrit : On Fri, Apr 26, 2013 at 7:57 PM, Samuel Klein meta...@gmail.com wrote: Yes. Finding a way to capture and integrate the work OmegaWiki has done into a new Wikidata-powered Wiktionary would be a useful start. And we've already sort of claimed the space

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-27 Thread Ryu Cheol
Thanks to Jane for introducing CoSyne. But I feel all the wikis do not want to be synchronized to certain wikis. Rather than having identical articles, I hope they would have their own articles. I hope I could have two more tabs at right of the 'Article' and 'Talk' on English Wikipedia for

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-26 Thread Shlomi Fish
Hi all, On Wed, 24 Apr 2013 08:39:55 +0200 Ting Chen wing.phil...@gmx.de wrote: Oh yes, this would really be great. Just think about the money the Foundation gives out meanwhile for translation, plus the many many volunteers' work invested into translation. A free and open translation

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-26 Thread Mathieu Stumpf
Le 2013-04-25 20:56, Theo10011 a écrit : As far as Linguistic typology goes, it's far too unique and too varied to have a language independent form develop as easily. Perhaps it also depends on the perspective. For example, the majority of people commenting here (Americans, Europeans) might

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-26 Thread Jane Darnell
We already have the translation options on the left side of the screen in any Wikipedia article. This choice is generally a smattering of languages, and a long term goal for many small-language Wikipedias is to be able to translate an article from related languages (say from Dutch into Frisian,

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-26 Thread Bjoern Hoehrmann
* Andrea Zanni wrote: At the moment, Wikisource could be a interesting corpora and laboratory for improving and enhancing OCR, as the OCR generated text is always proofread and corrected by humans. As part of our project ( http://wikisource.org/wiki/Wikisource_vision_development), Micru was

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-26 Thread Samuel Klein
On Fri, Apr 26, 2013 at 1:24 PM, Bjoern Hoehrmann derhoe...@gmx.net wrote: * Erik Moeller wrote: Are there open source MT efforts that are close enough to merit scrutiny? Wiktionary. If you want to help free software efforts in the area of machine translation, then what they seem to need most

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-26 Thread Milos Rancic
On Thu, Apr 25, 2013 at 4:26 PM, Denny Vrandečić denny.vrande...@wikimedia.de wrote: Not just bootstrapping the content. By having the primary content be saved in a language independent form, and always translating it on the fly, it would not merely bootstrap content in different languages, but

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-26 Thread Milos Rancic
On Fri, Apr 26, 2013 at 7:57 PM, Samuel Klein meta...@gmail.com wrote: Yes. Finding a way to capture and integrate the work OmegaWiki has done into a new Wikidata-powered Wiktionary would be a useful start. And we've already sort of claimed the space (though we are neglecting it) -- it's

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-25 Thread Nikola Smolenski
On 24/04/13 12:35, Denny Vrandečić wrote: Current machine translation research aims at using massive machine learning supported systems. They usually require big parallel corpora. We do not have big parallel corpora (Wikipedia articles are not translations of each other, in general), especially

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-25 Thread Nikola Smolenski
On 24/04/13 12:35, Denny Vrandečić wrote: In summary, I see four calls for action right now (and for all of them this means to first actually think more and write down a project plan and gather input on that), that could and should be tackled in parallel if possible: I ) develop a structured

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-25 Thread Erik Moeller
Denny, very good and compelling reasoning as always. I think the argument that we can potentially do a lot for the MT space (including open source efforts) in part by getting our own house in order on the dictionary side of things makes a lot of sense. I don't think it necessarily excludes

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-25 Thread Mathieu Stumpf
Le 2013-04-25 04:49, George Herbert a écrit : We can't usefully help with internet access (and that's proceeding at good pace even in the third world), but language will remain a barrier when people get access. In a few situations politics / firewalling is as well (China, primarily), which is

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-25 Thread Brion Vibber
On Thu, Apr 25, 2013 at 7:26 AM, Denny Vrandečić denny.vrande...@wikimedia.de wrote: Not just bootstrapping the content. By having the primary content be saved in a language independent form, and always translating it on the fly, it would not merely bootstrap content in different languages,

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-25 Thread Denny Vrandečić
2013/4/25 Brion Vibber bvib...@wikimedia.org You are blowing my mind, dude. :) Glad to do hear :) I suspect this approach won't serve for everything, but it sounds *awesome*. If we can tie natural-language statements directly to data nodes (rather than merely annotating vague references

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-25 Thread Theo10011
On Thu, Apr 25, 2013 at 7:56 PM, Denny Vrandečić denny.vrande...@wikimedia.de wrote: Not just bootstrapping the content. By having the primary content be saved in a language independent form, and always translating it on the fly, it would not merely bootstrap content in different languages,

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-25 Thread George Herbert
This subthread seems headed out into practical / applied epistemology, if there is such a thing. I am not sure if we can get from here to there; that said, a new structure with language independent facts / information points that then got machine-explained or described in a local language would

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-24 Thread Ting Chen
Oh yes, this would really be great. Just think about the money the Foundation gives out meanwhile for translation, plus the many many volunteers' work invested into translation. A free and open translation software is long overdue indeed. Great idea Erik. Greetings Ting Am 4/24/2013 8:29 AM,

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-24 Thread MZMcBride
Erik Moeller wrote: Could open source MT be such a strategic investment? I don't know, but I'd like to at least raise the question. I think the alternative will be, for the foreseeable future, to accept that this piece of technology will be proprietary, and to rely on goodwill for any integration

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-24 Thread Federico Leva (Nemo)
A few links: * 2010 discussion: https://strategy.wikimedia.org/wiki/Proposal:Free_Translation_Memory as one of the https://strategy.wikimedia.org/wiki/List_of_things_that_need_to_be_free (follow links, including) * http://www.apertium.org : was used by translatewiki.net but isn't any longer

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-24 Thread Erik Moeller
On Wed, Apr 24, 2013 at 12:06 AM, MZMcBride z...@mzmcbride.com wrote: Though the Wikimedia community seems eager to add new projects (Wikidata, Wikivoyage), I wonder how it can be sensible or reasonable to focus on yet another project when the current projects are largely neglected (Wikinews,

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-24 Thread Mathias Schindler
On Wed, Apr 24, 2013 at 8:29 AM, Erik Moeller e...@wikimedia.org wrote: Could open source MT be such a strategic investment? I don't know, but I'd like to at least raise the question. I think the alternative will be, for the foreseeable future, to accept that this piece of technology will be

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-24 Thread Federico Leva (Nemo)
Erik Moeller, 24/04/2013 10:06: [...] Moreover, the lens of project/domain name is a very arbitrary one to define vertically focused efforts. A good and interesting reasoning here. Indeed something to keep in mind, but which adds problems. There are specialized efforts within Wikipedia

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-24 Thread Mark
On 4/24/13 8:29 AM, Erik Moeller wrote: Are there open source MT efforts that are close enough to merit scrutiny? In order to be able to provide high quality result, you would need not only a motivated, well-intentioned group of people, but some of the smartest people in the field working on it.

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-24 Thread Denny Vrandečić
Erik, all, sorry for the long mail. Incidentally, I have been thinking in this direction myself for a while, and I have come to a number of conclusions: 1) the Wikimedia movement can not, in its current state, tackle the problem of machine translation of arbitrary text from and to all of our

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-24 Thread Mark
A brief addendum, On 4/24/13 12:25 PM, Mark wrote: From 2006 through 2012 [the ERC] allocated about $10m to kickstart open-source MT, though focused primarily on European languages, via the EuroMatrix (2006-09) and EuroMatrixPlus (2009-12) research projects. Missed some projects. Seems the

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-24 Thread Fred Bauder
This is closely tied to software which is being developed, some of it secretly, to enable machines to understand and use language. As of now this will be government and corporate owned and controlled. I say closely tied because that is how translation works; only someone or something that

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-24 Thread Andrew Gray
On 24 April 2013 11:35, Denny Vrandečić denny.vrande...@wikimedia.de wrote: If we constrain b) a lot, we could just go and develop pages to display for pages that do not exist yet based on Wikidata in the smaller languages. That's a far cry from machine translating the articles, but it would

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-24 Thread Pavlo Shevelo
only someone or something that understands language can translate perfectly Precisely crude translations into little used languages are nearly worthless due to syntax issues. Useful work requires at least one person fluent in the language It's very true! Current Googe MT tools are reasonably

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-24 Thread Nikola Smolenski
On 24/04/13 08:29, Erik Moeller wrote: Could open source MT be such a strategic investment? I don't know, but I'd like to at least raise the question. I think the alternative will be, for the foreseeable future, to accept that this piece of technology will be proprietary, and to rely on goodwill

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-24 Thread Mathieu Stumpf
Le 2013-04-24 08:29, Erik Moeller a écrit : Are there open source MT efforts that are close enough to merit scrutiny? In order to be able to provide high quality result, you would need not only a motivated, well-intentioned group of people, but some of the smartest people in the field working

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-24 Thread Andrea Zanni
On Wed, Apr 24, 2013 at 2:04 PM, Mathieu Stumpf psychosl...@culture-libre.org wrote: I would like to add that (I'm no specialist of this subject) translating natural language probably need at least a large set of existing translations, at least to get read of obvious well known idiotisms like

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-24 Thread Mathieu Stumpf
Le 2013-04-24 12:35, Denny Vrandečić a écrit : 3) Wiktionary could be an even more amazing resource if we would finally tackle the issue of structuring its content more appropriately. I think Wikidata opened a few venues to structure planning in this direction and provide some software, but

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-24 Thread Samuel Klein
I really like Erik's original suggestion, and these ideas, Denny. Since there are many different possible goals, it's worth having a page just to list all of the possible different goals and compare them - both how they fit with one another and how they fit with existing active projects elsewhere

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-24 Thread Leslie Carr
(FYI this is me speaking with my personal hat on, none of these opinions are official in any way or the opinions of the foundation as an organization) personal_hat While Wikimedia is still only a medium-sized organization, it is not poor. With more than 1M donors supporting our mission and a

Re: [Wikimedia-l] The case for supporting open source machine translation

2013-04-24 Thread George Herbert
Leslie Carr wrote (personally, not officially): I think that while supporting open source machine translation is an awesome goal, it is out of scope of our budget and the engineering budget could be better spent elsewhere, such as with completing existing tools that are in development, but not