Re: [Mt-list] Public release of Haitian Creole language databyCarnegie Mellon
Hmm, yes, actually. Common sense is handy :-) . I only recall a handful of lawsuits related to free stuff, and usually the point was to get a lot of money from someone in the size of IBM. I kind of doubt that free corpora from a university for a language without much commercial potential in NLP would result in something like this. Best regards, Vadim - Original Message - From: Job M. van Zuijlen To: mt-list@eamt.org Sent: Saturday, January 23, 2010 10:29 AM Subject: Re: [Mt-list] Public release of Haitian Creole language databyCarnegie Mellon Some of the verbiage used in this discussion (lawyer bomb...) doesn't particularly encourage people to make their data freely available. What happened to common sense? I think CMU's initiative should be commended. Job van Zuijlen From: Robert Frederking Sent: Friday, January 22, 2010 16:32 To: Francis Tyers Cc: mt-list@eamt.org Subject: Re: [Mt-list] Public release of Haitian Creole language data byCarnegie Mellon I'm not a lawyer, but let me start by stating that out intent was simply that re-use included acknowledgement. This was not intended to be a splash-screen on every start-up, or making the software pronounce our names at the start of every sentence. :-) It only has to be "clearly visible" in anyone's source files. We aren't interested in suing people; we are a non-profit research organization. But like the Regents in California, we have a responsibility to our sponsors that appropriate credit is given for our work. So this is intended to be like the old BSD advertising clause, which is generally considered to be clear from a legal point of view. Please use the data however you want; just don't say you originally collected it. Bob Francis Tyers wrote: [ Sorry in advance for cross posting ] I'm going over this on the debian-legal mailing list (a good place to ask about issues in free/open-source software licensing). There is a question about clause 5 of the licence: ## 5. Any commercial, public or published work that uses this data ## ## must contain a clearly visible acknowledgment as to the ## ## provenance of the data. ## >From debian-legal: My concern is whether, contrary to the favourable interpretation you give, this is intended to act like an obnoxious advertising clause. In other words, what will satisfy “contain” in “contain a clearly visible acknowledgement”? Is it sufficient for the acknowledgement to be “clearly visible” only after inspecting various files in the source code? Or is the copyright holder's intent that the acknowledgement be clearly visible to every recipient, even those who receive a non-source form of the work? The latter would be a non-free restriction, like the obnoxious advertising clause in the older BSD licenses. This looks, as it is currently worded, more like a lawyerbomb now that I consider it. I would appreciate input on this from legally-trained minds. Could you confirm if that clause means that the acknowledgement should be _clearly visible_ to _every recipient_ or would it suffice to be visible after inspecting the source code? Thanks for your help in this and best regards, Francis Tyers El dj 21 de 01 de 2010 a les 22:59 -0500, en/na Alon Lavie va escriure: Hi Francis, Thanks for the suggestion, but we were advised to leave the licensing language as is. Our licensing language is effectively equivalent to the MIT license.and is unambiguous with respect to releasing the data for any use (commercial or non-commercial). Best regards, - *Alon* Francis Tyers wrote: El dj 21 de 01 de 2010 a les 14:49 -0500, en/na Robert Frederking va escriure: The Language Technologies Institute (LTI) of Carnegie Mellon University's School of Computer Science (CMU SCS) is making publicly available the Haitian Creole spoken and text data that we have collected or produced. We are providing this data with minimal restrictions in order to allow others to develop language technology for Haiti, in parallel with our own efforts to help with this crisis. Since organizing the data in a useful fashion is not instantaneous, and more text data is currently being produced by collaborators, we will be publishing the data incrementally on the web, as it becomes available. To access the currently available data, please visit the website at http://www.speech.cs.cmu.edu/haitian/ Would you consider also dual/triple licensing the data under an existing free software licence, such as the MIT licence[1] or the GNU GPL[2] ? This way it could be combined with existing data under these licences (e.g. the majority of
Re: [Mt-list] Public release of Haitian Creole language data byCarnegie Mellon
Some of the verbiage used in this discussion (lawyer bomb...) doesn't particularly encourage people to make their data freely available. What happened to common sense? I think CMU's initiative should be commended. Job van Zuijlen From: Robert Frederking Sent: Friday, January 22, 2010 16:32 To: Francis Tyers Cc: mt-list@eamt.org Subject: Re: [Mt-list] Public release of Haitian Creole language data byCarnegie Mellon I'm not a lawyer, but let me start by stating that out intent was simply that re-use included acknowledgement. This was not intended to be a splash-screen on every start-up, or making the software pronounce our names at the start of every sentence. :-) It only has to be "clearly visible" in anyone's source files. We aren't interested in suing people; we are a non-profit research organization. But like the Regents in California, we have a responsibility to our sponsors that appropriate credit is given for our work. So this is intended to be like the old BSD advertising clause, which is generally considered to be clear from a legal point of view. Please use the data however you want; just don't say you originally collected it. Bob Francis Tyers wrote: [ Sorry in advance for cross posting ] I'm going over this on the debian-legal mailing list (a good place to ask about issues in free/open-source software licensing). There is a question about clause 5 of the licence: ## 5. Any commercial, public or published work that uses this data ## ## must contain a clearly visible acknowledgment as to the ## ## provenance of the data. ## >From debian-legal: My concern is whether, contrary to the favourable interpretation you give, this is intended to act like an obnoxious advertising clause. In other words, what will satisfy “contain” in “contain a clearly visible acknowledgement”? Is it sufficient for the acknowledgement to be “clearly visible” only after inspecting various files in the source code? Or is the copyright holder's intent that the acknowledgement be clearly visible to every recipient, even those who receive a non-source form of the work? The latter would be a non-free restriction, like the obnoxious advertising clause in the older BSD licenses. This looks, as it is currently worded, more like a lawyerbomb now that I consider it. I would appreciate input on this from legally-trained minds. Could you confirm if that clause means that the acknowledgement should be _clearly visible_ to _every recipient_ or would it suffice to be visible after inspecting the source code? Thanks for your help in this and best regards, Francis Tyers El dj 21 de 01 de 2010 a les 22:59 -0500, en/na Alon Lavie va escriure: Hi Francis, Thanks for the suggestion, but we were advised to leave the licensing language as is. Our licensing language is effectively equivalent to the MIT license.and is unambiguous with respect to releasing the data for any use (commercial or non-commercial). Best regards, - *Alon* Francis Tyers wrote: El dj 21 de 01 de 2010 a les 14:49 -0500, en/na Robert Frederking va escriure: The Language Technologies Institute (LTI) of Carnegie Mellon University's School of Computer Science (CMU SCS) is making publicly available the Haitian Creole spoken and text data that we have collected or produced. We are providing this data with minimal restrictions in order to allow others to develop language technology for Haiti, in parallel with our own efforts to help with this crisis. Since organizing the data in a useful fashion is not instantaneous, and more text data is currently being produced by collaborators, we will be publishing the data incrementally on the web, as it becomes available. To access the currently available data, please visit the website at http://www.speech.cs.cmu.edu/haitian/ Would you consider also dual/triple licensing the data under an existing free software licence, such as the MIT licence[1] or the GNU GPL[2] ? This way it could be combined with existing data under these licences (e.g. the majority of free/open-source software) and researchers and developers don't need to hire legal advice to determine if they can combine their work with yours. Best regards, Fran 1. http://en.wikipedia.org/wiki/MIT_Licence#License_terms 2. http://www.gnu.org/licenses/gpl.html ___ Mt-list mailing list ___ Mt-list mailing list ___ Mt-list mailing list
Re: [Mt-list] Public release of Haitian Creole language data by Carnegie Mellon
BTW, we are working on a release of a non-trivial amount of parallel text data in English and HC, on the same website, hopefully shortly. http://www.speech.cs.cmu.edu/haitian/ Bob ___ Mt-list mailing list
Re: [Mt-list] Public release of Haitian Creole language data by Carnegie Mellon
P.S.: I just verified that this is the license that has been used by the CMU Sphinx and Flite projects for their free software for 12 years without problems, with the addition of the advertizing clause from BSD, also used without problems. Cheers. Bob ___ Mt-list mailing list
Re: [Mt-list] Public release of Haitian Creole language data by Carnegie Mellon
I'm not a lawyer, but let me start by stating that out intent was simply that re-use included acknowledgement. This was not intended to be a splash-screen on every start-up, or making the software pronounce our names at the start of every sentence. :-) It only has to be "clearly visible" in anyone's source files. We aren't interested in suing people; we are a non-profit research organization. But like the Regents in California, we have a responsibility to our sponsors that appropriate credit is given for our work. So this is intended to be like the old BSD advertising clause, which is generally considered to be clear from a legal point of view. Please use the data however you want; just don't say you originally collected it. Bob Francis Tyers wrote: [ Sorry in advance for cross posting ] I'm going over this on the debian-legal mailing list (a good place to ask about issues in free/open-source software licensing). There is a question about clause 5 of the licence: ## 5. Any commercial, public or published work that uses this data ## ## must contain a clearly visible acknowledgment as to the ## ## provenance of the data. ## >From debian-legal: My concern is whether, contrary to the favourable interpretation you give, this is intended to act like an obnoxious advertising clause. In other words, what will satisfy “contain” in “contain a clearly visible acknowledgement”? Is it sufficient for the acknowledgement to be “clearly visible” only after inspecting various files in the source code? Or is the copyright holder's intent that the acknowledgement be clearly visible to every recipient, even those who receive a non-source form of the work? The latter would be a non-free restriction, like the obnoxious advertising clause in the older BSD licenses. This looks, as it is currently worded, more like a lawyerbomb now that I consider it. I would appreciate input on this from legally-trained minds. Could you confirm if that clause means that the acknowledgement should be _clearly visible_ to _every recipient_ or would it suffice to be visible after inspecting the source code? Thanks for your help in this and best regards, Francis Tyers El dj 21 de 01 de 2010 a les 22:59 -0500, en/na Alon Lavie va escriure: Hi Francis, Thanks for the suggestion, but we were advised to leave the licensing language as is. Our licensing language is effectively equivalent to the MIT license.and is unambiguous with respect to releasing the data for any use (commercial or non-commercial). Best regards, - *Alon* Francis Tyers wrote: El dj 21 de 01 de 2010 a les 14:49 -0500, en/na Robert Frederking va escriure: The Language Technologies Institute (LTI) of Carnegie Mellon University's School of Computer Science (CMU SCS) is making publicly available the Haitian Creole spoken and text data that we have collected or produced. We are providing this data with minimal restrictions in order to allow others to develop language technology for Haiti, in parallel with our own efforts to help with this crisis. Since organizing the data in a useful fashion is not instantaneous, and more text data is currently being produced by collaborators, we will be publishing the data incrementally on the web, as it becomes available. To access the currently available data, please visit the website at http://www.speech.cs.cmu.edu/haitian/ Would you consider also dual/triple licensing the data under an existing free software licence, such as the MIT licence[1] or the GNU GPL[2] ? This way it could be combined with existing data under these licences (e.g. the majority of free/open-source software) and researchers and developers don't need to hire legal advice to determine if they can combine their work with yours. Best regards, Fran 1. http://en.wikipedia.org/wiki/MIT_Licence#License_terms 2. http://www.gnu.org/licenses/gpl.html ___ Mt-list mailing list ___ Mt-list mailing list
Re: [Mt-list] Public release of Haitian Creole language data by Carnegie Mellon
[ Sorry in advance for cross posting ] I'm going over this on the debian-legal mailing list (a good place to ask about issues in free/open-source software licensing). There is a question about clause 5 of the licence: ## 5. Any commercial, public or published work that uses this data ## ## must contain a clearly visible acknowledgment as to the ## ## provenance of the data. ## >From debian-legal: My concern is whether, contrary to the favourable interpretation you give, this is intended to act like an obnoxious advertising clause. In other words, what will satisfy “contain” in “contain a clearly visible acknowledgement”? Is it sufficient for the acknowledgement to be “clearly visible” only after inspecting various files in the source code? Or is the copyright holder's intent that the acknowledgement be clearly visible to every recipient, even those who receive a non-source form of the work? The latter would be a non-free restriction, like the obnoxious advertising clause in the older BSD licenses. This looks, as it is currently worded, more like a lawyerbomb now that I consider it. I would appreciate input on this from legally-trained minds. Could you confirm if that clause means that the acknowledgement should be _clearly visible_ to _every recipient_ or would it suffice to be visible after inspecting the source code? Thanks for your help in this and best regards, Francis Tyers El dj 21 de 01 de 2010 a les 22:59 -0500, en/na Alon Lavie va escriure: > Hi Francis, > > Thanks for the suggestion, but we were advised to leave the licensing > language as is. Our licensing language is effectively equivalent to the > MIT license.and is unambiguous with respect to releasing the data for > any use (commercial or non-commercial). > > Best regards, > > - *Alon* > > Francis Tyers wrote: > > El dj 21 de 01 de 2010 a les 14:49 -0500, en/na Robert Frederking va > > escriure: > > > >> The Language Technologies Institute (LTI) of Carnegie Mellon University's > >> School of Computer Science (CMU SCS) is making publicly available the > >> Haitian Creole spoken and text data that we have collected or produced. We > >> are providing this data with minimal restrictions in order to > >> allow others to develop language technology for Haiti, in parallel with our > >> own efforts to help with this crisis. Since organizing the data in a useful > >> fashion is not instantaneous, and more text data is currently being > >> produced > >> by collaborators, we will be publishing the data incrementally on the web, > >> as it becomes available. To access the currently available data, please > >> visit the website at http://www.speech.cs.cmu.edu/haitian/ > >> > > > > Would you consider also dual/triple licensing the data under an existing > > free software licence, such as the MIT licence[1] or the GNU GPL[2] ? > > This way it could be combined with existing data under these licences > > (e.g. the majority of free/open-source software) and researchers and > > developers don't need to hire legal advice to determine if they can > > combine their work with yours. > > > > Best regards, > > > > Fran > > > > 1. http://en.wikipedia.org/wiki/MIT_Licence#License_terms > > 2. http://www.gnu.org/licenses/gpl.html > > > > ___ > > Mt-list mailing list > > > > ___ Mt-list mailing list
[Mt-list] DCU MT GROUP RELEASES FREE/OPEN-SOURCE EBMT SYSTEM MARCLATOR
DCU MT GROUP RELEASES FREE/OPEN-SOURCE EBMT SYSTEM MARCLATOR The Centre for Next Generations (CNGL) Machine Translation group, led by Prof. Andy Way at Dublin City University (DCU), announces the release of Marclator (Marker-based Translator), a free/open-source system for Example Based Machine Translation (EBMT). This release coincides with the 4th MT Marathon, a week-long event being hosted January 25th-30th by the CNGL and the National Centre for Language Technology (NCLT) at DCU in conjunction with the EuroMatrix+ project, where over 100 participants from 20 countries will have a chance to test and program open-source MT tools and systems. The Marclator EBMT system release includes a fully functional marker-based chunker/tagger (based on Greens marker hypothesis) with markers for some languages and a chunk aligner, as well as a proof-of-concept naïve (monotone) recombination module or decoder. This free/open-source release results from collaboration with Prof. Mikel L. Forcada of Universitat dAlacant in Spain who is currently a visiting researcher within the CNGL MT group at DCU through an ETS Walton Award from Science Foundation Ireland (SFI). Through SFI funding of the Centre for Next Generation Localisation and additional funding from EU FP7 research projects currently coming on stream, DCU now boasts one of the largest academic research groups focused on MT worldwide. The Marclator release is seen as a first-step in a strategy of participation in the free/open-source community in parallel with a programme of commercial engagement with companies interested in adopting, tuning and deploying machine translation technology. Over the past number of years, Prof Andy Way has led the MT group at DCU in pursuing corpus-based approaches to MT, which have culminated in the MaTrEx system, a modular, maintainable and efficient data-driven machine translation system which combines example-based machine translation (EBMT) and statistical machine translation (SMT) and which consistently ranks as one of the top-performing MT systems in open machine translation evaluations (e.g. WMT-09, IWSLT-09, etc.). As a follow-on to the Marclator release, Prof. Way and Prof. Forcada will continue to collaborate toward a free/open-source release of a baseline MaTrEx system, combining Marclator with the Moses SMT decoder. This OpenMaTrEx release is anticipated for Spring 2010. Resources: http://www.cngl.ie http://nclt.dcu.ie/mt http://www.computing.dcu.ie/~mforcada/fosmt.html http://www.computing.dcu.ie/~mforcada/marclator.html http://www.euromatrixplus.net/ http://www.mtmarathon2010.info/web/Welcome.html For more information please contact: i...@cngl.ie Mikel L. Forcada Dept. Llenguatges i Sistemes Informàtics Universitat d\\\'Alacant, E-03071 Alacant (Spain) Tel.: +34 96 590 9776Fax: +34 96 590 9326 ___ Mt-list mailing list
Re: [Mt-list] Public release of Haitian Creole language data by Carnegie Mellon
Glen, MT-listers: Glen wrote: Let's not inhibit folks from doing good by making it hard to do so! I don't think Fran and I are in any way inhibiting anyone from doing good. We are just suggesting ways in which good can be done more effectively, by allowing the use of these resources with the existing free/open-source tools (for a list of free/open-source MT software, see http://computing.dcu.ie/~mforcada/fosmt.html). In fact, some of us were planning on coordinating some work to build a preliminary free/open-source English-Haitian MT system during the http://www.mtmarathon2010.info) next week, using available free/open-source resources, hence our concerns. Best, Mikel L. Forcada ___ Mt-list mailing list