Re: [Mt-list] Public release of Haitian Creole language data byCarnegie Mellon
Well, my understanding is that, unfortunately, most companies won't touch anything that's under GPL, so I don't think that's a solution. We don't want to exclude commercial entities. Bob Francis Tyers wrote: First of all, thanks to CMU for releasing the data. I've no doubt it will be valuable to people working in the field. I don't particularly like terms like lawyerbomb and obnoxious advertising clause, but this merits a response. People who don't get paid to work on the software they develop, aren't employed by big universities or companies are understandably concerned about getting sued -- you can say but they've never been sued before, so why should they worry -- but this isn't really convincing. They can get frustrated that people make more work for themselves and others. * Making up your own 'free/open-source' licence: More work for you, more work for them. * Choosing an existing tried and tested 'free/open-source' licence: Less work for you, less work for them. Furthermore, they can also find it frustrating that a non-profit organisation would release their work under a licence that is incompatible with that of over 60% of free software.[1] Fran PS. Some of these same issues are reviewed in Ted Pedersen's excellent 2008 article: http://www.d.umn.edu/~tpederse/Pubs/pedersen-last-word-2008.pdf =Notes= 1. http://www.blackducksoftware.com/oss/licenses#top20 El dv 22 de 01 de 2010 a les 18:29 -0500, en/na Job M. van Zuijlen va escriure: Some of the verbiage used in this discussion (lawyer bomb...) doesn't particularly encourage people to make their data freely available. What happened to common sense? I think CMU's initiative should be commended. Job van Zuijlen From: Robert Frederking Sent: Friday, January 22, 2010 16:32 To: Francis Tyers Cc: mt-list@eamt.org Subject: Re: [Mt-list] Public release of Haitian Creole language data byCarnegie Mellon I'm not a lawyer, but let me start by stating that out intent was simply that re-use included acknowledgement. This was not intended to be a splash-screen on every start-up, or making the software pronounce our names at the start of every sentence. :-) It only has to be clearly visible in anyone's source files. We aren't interested in suing people; we are a non-profit research organization. But like the Regents in California, we have a responsibility to our sponsors that appropriate credit is given for our work. So this is intended to be like the old BSD advertising clause, which is generally considered to be clear from a legal point of view. Please use the data however you want; just don't say you originally collected it. Bob Francis Tyers wrote: [ Sorry in advance for cross posting ] I'm going over this on the debian-legal mailing list (a good place to ask about issues in free/open-source software licensing). There is a question about clause 5 of the licence: ## 5. Any commercial, public or published work that uses this data ## ## must contain a clearly visible acknowledgment as to the ## ## provenance of the data. ## From debian-legal: My concern is whether, contrary to the favourable interpretation you give, this is intended to act like an obnoxious advertising clause. In other words, what will satisfy “contain” in “contain a clearly visible acknowledgement”? Is it sufficient for the acknowledgement to be “clearly visible” only after inspecting various files in the source code? Or is the copyright holder's intent that the acknowledgement be clearly visible to every recipient, even those who receive a non-source form of the work? The latter would be a non-free restriction, like the obnoxious advertising clause in the older BSD licenses. This looks, as it is currently worded, more like a lawyerbomb now that I consider it. I would appreciate input on this from legally-trained minds. Could you confirm if that clause means that the acknowledgement should be _clearly visible_ to _every recipient_ or would it suffice to be visible after inspecting the source code? Thanks for your help in this and best regards, Francis Tyers El dj 21 de 01 de 2010 a les 22:59 -0500, en/na Alon Lavie va escriure: Hi Francis, Thanks for the suggestion, but we were advised to leave the licensing language as is. Our licensing language is effectively equivalent to the MIT license.and is unambiguous with respect to releasing the data for any use (commercial or non-commercial). Best regards, - *Alon* Francis Tyers wrote: El dj 21 de 01 de 2010 a les 14:49 -0500, en/na Robert Frederking va escriure: The Language Technologies Institute (LTI
Re: [Mt-list] Public release of Haitian Creole language data byCarnegie Mellon
Hi Robert, These are commendable efforts, but isn't French the principal written language in Haiti? Or you are talking about a speech to speech system? Best regards, Vadim - Original Message - From: Robert Frederking r...@cs.cmu.edu To: mt_l...@nist.gov; mt-list@eamt.org Sent: Friday, January 22, 2010 6:49 AM Subject: [Mt-list] Public release of Haitian Creole language data byCarnegie Mellon The Language Technologies Institute (LTI) of Carnegie Mellon University's School of Computer Science (CMU SCS) is making publicly available the Haitian Creole spoken and text data that we have collected or produced. We are providing this data with minimal restrictions in order to allow others to develop language technology for Haiti, in parallel with our own efforts to help with this crisis. Since organizing the data in a useful fashion is not instantaneous, and more text data is currently being produced by collaborators, we will be publishing the data incrementally on the web, as it becomes available. To access the currently available data, please visit the website at http://www.speech.cs.cmu.edu/haitian/ ___ Mt-list mailing list ___ Mt-list mailing list
Re: [Mt-list] Public release of Haitian Creole language data byCarnegie Mellon
Hi Vadim, Yes, French is the principal written language, but most of the population only speaks Creole (and is illiterate). We ourselves are indeed looking at making speech-based systems (and the rarest part of the data may be the speech data). There may also be unforeseen benefits to the data being available. For example, it appears that Doctors Without Borders (Médecins Sans Frontières) may make use of the bilingual medical phrases as-is, through Translators Without Borders (Traducteurs sans Frontières). So who knows how this may help. Cheers. Bob // Vadim Berman wrote: Hi Robert, These are commendable efforts, but isn't French the principal written language in Haiti? Or you are talking about a speech to speech system? Best regards, Vadim - Original Message - From: Robert Frederking r...@cs.cmu.edu To: mt_l...@nist.gov; mt-list@eamt.org Sent: Friday, January 22, 2010 6:49 AM Subject: [Mt-list] Public release of Haitian Creole language data byCarnegie Mellon The Language Technologies Institute (LTI) of Carnegie Mellon University's School of Computer Science (CMU SCS) is making publicly available the Haitian Creole spoken and text data that we have collected or produced. We are providing this data with minimal restrictions in order to allow others to develop language technology for Haiti, in parallel with our own efforts to help with this crisis. Since organizing the data in a useful fashion is not instantaneous, and more text data is currently being produced by collaborators, we will be publishing the data incrementally on the web, as it becomes available. To access the currently available data, please visit the website at http://www.speech.cs.cmu.edu/haitian/ ___ Mt-list mailing list ___ Mt-list mailing list
Re: [Mt-list] Public release of Haitian Creole language data byCarnegie Mellon
Sorry I couldn't intervene quickly enough guys. Dealing with lots of requests from people about the announcement. Bob has mentioned several of the things we are looking at right now, and concrete ways to meet specific needs. I'm on several networks with many translation agencies and associations of translators/interpreters, with confcalls and email communication on a very regular basis. I am corresponding with many Haitians who want to translate and do interpretation for their fellow countrymen. The flow of requests for translation from/into so many languages for this crisis is astounding. Current estimations is that a potential of 40,000 NGOs will be involved in assisting in this disaster relief effort. I'm hearing now of teams being deployed from Brazil, Bolivia and the Philippines. Cargo ships are coming from Italy now as well. The translation and interpretation needs vary based on type of content, the intended speaker and receiver of the message, means of communication. It is not a one-size-fits-all communication need right now. As for the languages in Haiti. Well, many of my publications concerning Haitian creole language and technologies (https://www.box.net/shared/bz4sq9jx88) also include descriptions of the sociolinguistic factors which affect the approach and means of implementing technologies. Both French and Haitian Creole are official languages in Haiti, both spoken and written forms. The IPN spelling system for Haitian Creole adopted in 1979 was in fact an Orthography Law. Experts have documented 10+ different spelling systems over the history of the language, and that does not include the hybrid forms. The majority of the Haitian population is illiterate. Many reports up to the end of the 90s said that it was 80-90% illiteracy in any language. When I was on a trip for CMU in 1998, all of the students we were recording for the speech data could read and write (as well as speak) in Haitian Creole very well, compared with the reading level of Haitians of the diaspora whom I had recorded in 1997 on other trips in the US and France. Communication with regard to crowd control, medical treatment and other areas is much more effective into Haitian Creole. Other types of communication between the participating NGOs and other organizations would be from/to French. Much depends on the purpose of the communication and the participants. In this time of psychological and physical trauma, it is much more effective to speak to the Haitians in their original mother tongue (Haitian Creole). A small part of the population does grow up only hearing French at home, and many speak Creole until they start going to school and then learn French, just as is the case in various Africans countries. It is sad that it took a disaster for Haitian Creole to receive so much recognition as a language. I'm not subscribed to mt_l...@nist.gov, so can someone please repost this to that list, since it will certainly rebounce to me. Jeff http://www.linkedin.com/in/jeffallen == Quoting Robert Frederking r...@cs.cmu.edu: Hi Vadim, Yes, French is the principal written language, but most of the population only speaks Creole (and is illiterate). We ourselves are indeed looking at making speech-based systems (and the rarest part of the data may be the speech data). There may also be unforeseen benefits to the data being available. For example, it appears that Doctors Without Borders (Médecins Sans Frontières) may make use of the bilingual medical phrases as-is, through Translators Without Borders (Traducteurs sans Frontières). So who knows how this may help. Cheers. Bob // Vadim Berman wrote: Hi Robert, These are commendable efforts, but isn't French the principal written language in Haiti? Or you are talking about a speech to speech system? Best regards, Vadim - Original Message - From: Robert Frederking r...@cs.cmu.edu To: mt_l...@nist.gov; mt-list@eamt.org Sent: Friday, January 22, 2010 6:49 AM Subject: [Mt-list] Public release of Haitian Creole language data byCarnegie Mellon The Language Technologies Institute (LTI) of Carnegie Mellon University's School of Computer Science (CMU SCS) is making publicly available the Haitian Creole spoken and text data that we have collected or produced. We are providing this data with minimal restrictions in order to allow others to develop language technology for Haiti, in parallel with our own efforts to help with this crisis. Since organizing the data in a useful fashion is not instantaneous, and more text data is currently being produced by collaborators, we will be publishing the data incrementally on the web, as it becomes available. To access the currently available data, please visit the website at http://www.speech.cs.cmu.edu/haitian/ ___ Mt-list mailing list ___ Mt-list