Re: [Mt-list] Public release of Haitian Creole language databyCarnegie Mellon

2010-01-22 Thread Vadim Berman
Hmm, yes, actually. Common sense is handy :-) .

I only recall a handful of lawsuits related to free stuff, and usually the 
point was to get a lot of money from someone in the size of IBM. I kind of 
doubt that free corpora from 
a university for a language without much commercial potential in NLP would 
result in something like this. 

 Best regards,
 Vadim

  - Original Message - 
  From: Job M. van Zuijlen 
  To: mt-list@eamt.org 
  Sent: Saturday, January 23, 2010 10:29 AM
  Subject: Re: [Mt-list] Public release of Haitian Creole language 
databyCarnegie Mellon


  Some of the verbiage used in this discussion (lawyer bomb...) doesn't 
particularly encourage people to make their data freely available. What 
happened to common sense?  I think CMU's initiative should be commended.

  Job van Zuijlen


  From: Robert Frederking 
  Sent: Friday, January 22, 2010 16:32
  To: Francis Tyers 
  Cc: mt-list@eamt.org 
  Subject: Re: [Mt-list] Public release of Haitian Creole language data 
byCarnegie Mellon


  I'm not a lawyer, but let me start by stating that out intent was simply that 
re-use included acknowledgement.  This was not intended to be a splash-screen 
on every start-up, or making the software pronounce our names at the start of 
every sentence.  :-)  It only has to be "clearly visible" in anyone's source 
files.

  We aren't interested in suing people; we are a non-profit research 
organization.  But like the Regents in California, we have a responsibility to 
our sponsors that appropriate credit is given for our work.  So this is 
intended to be like the old BSD advertising clause, which is generally 
considered to be clear from a legal point of view. 

  Please use the data however you want; just don't say you originally collected 
it.

  Bob

  Francis Tyers wrote: 
[ Sorry in advance for cross posting ]

I'm going over this on the debian-legal mailing list (a good place to
ask about issues in free/open-source software licensing).

There is a question about clause 5 of the licence:



##  5. Any commercial, public or published work that uses this data
##
## must contain a clearly visible acknowledgment as to the   ##
## provenance of the data.   ##



>From debian-legal:

 My concern is whether, contrary to the favourable interpretation you
 give, this is intended to act like an obnoxious advertising clause.

 In other words, what will satisfy “contain” in “contain a clearly
 visible acknowledgement”? Is it sufficient for the acknowledgement to  
 be “clearly visible” only after inspecting various files in the source
 code?

 Or is the copyright holder's intent that the acknowledgement be clearly
 visible to every recipient, even those who receive a non-source form of
 the work? The latter would be a non-free restriction, like the  
 obnoxious advertising clause in the older BSD licenses.

 This looks, as it is currently worded, more like a lawyerbomb now that 
 I consider it. I would appreciate input on this from legally-trained  
 minds.



Could you confirm if that clause means that the acknowledgement should
be _clearly visible_ to _every recipient_ or would it suffice to be
visible after inspecting the source code?

Thanks for your help in this and best regards,

Francis Tyers


El dj 21 de 01 de 2010 a les 22:59 -0500, en/na Alon Lavie va escriure:
  Hi Francis,

Thanks for the suggestion, but we were advised to leave the licensing 
language as is.  Our licensing language is effectively equivalent to the 
MIT license.and is unambiguous with respect to releasing the data for 
any use (commercial or non-commercial).

Best regards,

- *Alon*

Francis Tyers wrote:
El dj 21 de 01 de 2010 a les 14:49 -0500, en/na Robert Frederking va
escriure:
  
  The Language Technologies Institute (LTI) of Carnegie Mellon University's
School of Computer Science (CMU SCS) is making publicly available the
Haitian Creole spoken and text data that we have collected or produced. We
are providing this data with minimal restrictions in order to
allow others to develop language technology for Haiti, in parallel with our
own efforts to help with this crisis. Since organizing the data in a useful
fashion is not instantaneous, and more text data is currently being 
produced
by collaborators, we will be publishing the data incrementally on the web,
as it becomes available.  To access the currently available data, please
visit the website at  http://www.speech.cs.cmu.edu/haitian/

Would you consider also dual/triple licensing the data under an existing
free software licence, such as the MIT licence[1] or the GNU GPL[2] ?
This way it could be combined with existing data under these licences
(e.g. the majority of

Re: [Mt-list] Public release of Haitian Creole language data byCarnegie Mellon

2010-01-22 Thread Job M. van Zuijlen
Some of the verbiage used in this discussion (lawyer bomb...) doesn't 
particularly encourage people to make their data freely available. What 
happened to common sense?  I think CMU's initiative should be commended.

Job van Zuijlen


From: Robert Frederking 
Sent: Friday, January 22, 2010 16:32
To: Francis Tyers 
Cc: mt-list@eamt.org 
Subject: Re: [Mt-list] Public release of Haitian Creole language data 
byCarnegie Mellon


I'm not a lawyer, but let me start by stating that out intent was simply that 
re-use included acknowledgement.  This was not intended to be a splash-screen 
on every start-up, or making the software pronounce our names at the start of 
every sentence.  :-)  It only has to be "clearly visible" in anyone's source 
files.

We aren't interested in suing people; we are a non-profit research 
organization.  But like the Regents in California, we have a responsibility to 
our sponsors that appropriate credit is given for our work.  So this is 
intended to be like the old BSD advertising clause, which is generally 
considered to be clear from a legal point of view. 

Please use the data however you want; just don't say you originally collected 
it.

Bob

Francis Tyers wrote: 
[ Sorry in advance for cross posting ]

I'm going over this on the debian-legal mailing list (a good place to
ask about issues in free/open-source software licensing).

There is a question about clause 5 of the licence:



##  5. Any commercial, public or published work that uses this data
##
## must contain a clearly visible acknowledgment as to the   ##
## provenance of the data.   ##



>From debian-legal:

 My concern is whether, contrary to the favourable interpretation you
 give, this is intended to act like an obnoxious advertising clause.

 In other words, what will satisfy “contain” in “contain a clearly
 visible acknowledgement”? Is it sufficient for the acknowledgement to  
 be “clearly visible” only after inspecting various files in the source
 code?

 Or is the copyright holder's intent that the acknowledgement be clearly
 visible to every recipient, even those who receive a non-source form of
 the work? The latter would be a non-free restriction, like the  
 obnoxious advertising clause in the older BSD licenses.

 This looks, as it is currently worded, more like a lawyerbomb now that 
 I consider it. I would appreciate input on this from legally-trained  
 minds.



Could you confirm if that clause means that the acknowledgement should
be _clearly visible_ to _every recipient_ or would it suffice to be
visible after inspecting the source code?

Thanks for your help in this and best regards,

Francis Tyers


El dj 21 de 01 de 2010 a les 22:59 -0500, en/na Alon Lavie va escriure:
  Hi Francis,

Thanks for the suggestion, but we were advised to leave the licensing 
language as is.  Our licensing language is effectively equivalent to the 
MIT license.and is unambiguous with respect to releasing the data for 
any use (commercial or non-commercial).

Best regards,

- *Alon*

Francis Tyers wrote:
El dj 21 de 01 de 2010 a les 14:49 -0500, en/na Robert Frederking va
escriure:
  
  The Language Technologies Institute (LTI) of Carnegie Mellon University's
School of Computer Science (CMU SCS) is making publicly available the
Haitian Creole spoken and text data that we have collected or produced. We
are providing this data with minimal restrictions in order to
allow others to develop language technology for Haiti, in parallel with our
own efforts to help with this crisis. Since organizing the data in a useful
fashion is not instantaneous, and more text data is currently being 
produced
by collaborators, we will be publishing the data incrementally on the web,
as it becomes available.  To access the currently available data, please
visit the website at  http://www.speech.cs.cmu.edu/haitian/

Would you consider also dual/triple licensing the data under an existing
free software licence, such as the MIT licence[1] or the GNU GPL[2] ?
This way it could be combined with existing data under these licences
(e.g. the majority of free/open-source software) and researchers and
developers don't need to hire legal advice to determine if they can
combine their work with yours.

Best regards, 

Fran

1. http://en.wikipedia.org/wiki/MIT_Licence#License_terms
2. http://www.gnu.org/licenses/gpl.html

___
Mt-list mailing list

  
  


  




___
Mt-list mailing list
___
Mt-list mailing list


Re: [Mt-list] Public release of Haitian Creole language data by Carnegie Mellon

2010-01-22 Thread Robert Frederking
BTW, we are working on a release of a non-trivial amount of parallel 
text data in English and HC, on the same website, hopefully shortly.


http://www.speech.cs.cmu.edu/haitian/

   Bob


___
Mt-list mailing list


Re: [Mt-list] Public release of Haitian Creole language data by Carnegie Mellon

2010-01-22 Thread Robert Frederking
P.S.: I just verified that this is the license that has been used by the 
CMU Sphinx and Flite projects for their free software for 12 years 
without problems, with the addition of the advertizing clause from BSD, 
also used without problems.  Cheers.


   Bob


___
Mt-list mailing list


Re: [Mt-list] Public release of Haitian Creole language data by Carnegie Mellon

2010-01-22 Thread Robert Frederking
I'm not a lawyer, but let me start by stating that out intent was simply 
that re-use included acknowledgement.  This was not intended to be a 
splash-screen on every start-up, or making the software pronounce our 
names at the start of every sentence.  :-)  It only has to be "clearly 
visible" in anyone's source files.


We aren't interested in suing people; we are a non-profit research 
organization.  But like the Regents in California, we have a 
responsibility to our sponsors that appropriate credit is given for our 
work.  So this is intended to be like the old BSD advertising clause, 
which is generally considered to be clear from a legal point of view.


Please use the data however you want; just don't say you originally 
collected it.


   Bob

Francis Tyers wrote:

[ Sorry in advance for cross posting ]

I'm going over this on the debian-legal mailing list (a good place to
ask about issues in free/open-source software licensing).

There is a question about clause 5 of the licence:



##  5. Any commercial, public or published work that uses this data
##
## must contain a clearly visible acknowledgment as to the   ##
## provenance of the data.   ##



>From debian-legal:

 My concern is whether, contrary to the favourable interpretation you
 give, this is intended to act like an obnoxious advertising clause.

 In other words, what will satisfy “contain” in “contain a clearly
 visible acknowledgement”? Is it sufficient for the acknowledgement to  
 be “clearly visible” only after inspecting various files in the source

 code?

 Or is the copyright holder's intent that the acknowledgement be clearly
 visible to every recipient, even those who receive a non-source form of
 the work? The latter would be a non-free restriction, like the  
 obnoxious advertising clause in the older BSD licenses.


 This looks, as it is currently worded, more like a lawyerbomb now that 
 I consider it. I would appreciate input on this from legally-trained  
 minds.




Could you confirm if that clause means that the acknowledgement should
be _clearly visible_ to _every recipient_ or would it suffice to be
visible after inspecting the source code?

Thanks for your help in this and best regards,

Francis Tyers


El dj 21 de 01 de 2010 a les 22:59 -0500, en/na Alon Lavie va escriure:
  

Hi Francis,

Thanks for the suggestion, but we were advised to leave the licensing 
language as is.  Our licensing language is effectively equivalent to the 
MIT license.and is unambiguous with respect to releasing the data for 
any use (commercial or non-commercial).


Best regards,

- *Alon*

Francis Tyers wrote:


El dj 21 de 01 de 2010 a les 14:49 -0500, en/na Robert Frederking va
escriure:
  
  

The Language Technologies Institute (LTI) of Carnegie Mellon University's
School of Computer Science (CMU SCS) is making publicly available the
Haitian Creole spoken and text data that we have collected or produced. We
are providing this data with minimal restrictions in order to
allow others to develop language technology for Haiti, in parallel with our
own efforts to help with this crisis. Since organizing the data in a useful
fashion is not instantaneous, and more text data is currently being 
produced

by collaborators, we will be publishing the data incrementally on the web,
as it becomes available.  To access the currently available data, please
visit the website at  http://www.speech.cs.cmu.edu/haitian/



Would you consider also dual/triple licensing the data under an existing
free software licence, such as the MIT licence[1] or the GNU GPL[2] ?
This way it could be combined with existing data under these licences
(e.g. the majority of free/open-source software) and researchers and
developers don't need to hire legal advice to determine if they can
combine their work with yours.

Best regards, 


Fran

1. http://en.wikipedia.org/wiki/MIT_Licence#License_terms
2. http://www.gnu.org/licenses/gpl.html

___
Mt-list mailing list

  
  




  
___
Mt-list mailing list


Re: [Mt-list] Public release of Haitian Creole language data by Carnegie Mellon

2010-01-22 Thread Francis Tyers
[ Sorry in advance for cross posting ]

I'm going over this on the debian-legal mailing list (a good place to
ask about issues in free/open-source software licensing).

There is a question about clause 5 of the licence:



##  5. Any commercial, public or published work that uses this data
##
## must contain a clearly visible acknowledgment as to the   ##
## provenance of the data.   ##



>From debian-legal:

 My concern is whether, contrary to the favourable interpretation you
 give, this is intended to act like an obnoxious advertising clause.

 In other words, what will satisfy “contain” in “contain a clearly
 visible acknowledgement”? Is it sufficient for the acknowledgement to  
 be “clearly visible” only after inspecting various files in the source
 code?

 Or is the copyright holder's intent that the acknowledgement be clearly
 visible to every recipient, even those who receive a non-source form of
 the work? The latter would be a non-free restriction, like the  
 obnoxious advertising clause in the older BSD licenses.

 This looks, as it is currently worded, more like a lawyerbomb now that 
 I consider it. I would appreciate input on this from legally-trained  
 minds.



Could you confirm if that clause means that the acknowledgement should
be _clearly visible_ to _every recipient_ or would it suffice to be
visible after inspecting the source code?

Thanks for your help in this and best regards,

Francis Tyers


El dj 21 de 01 de 2010 a les 22:59 -0500, en/na Alon Lavie va escriure:
> Hi Francis,
> 
> Thanks for the suggestion, but we were advised to leave the licensing 
> language as is.  Our licensing language is effectively equivalent to the 
> MIT license.and is unambiguous with respect to releasing the data for 
> any use (commercial or non-commercial).
> 
> Best regards,
> 
> - *Alon*
> 
> Francis Tyers wrote:
> > El dj 21 de 01 de 2010 a les 14:49 -0500, en/na Robert Frederking va
> > escriure:
> >   
> >> The Language Technologies Institute (LTI) of Carnegie Mellon University's
> >> School of Computer Science (CMU SCS) is making publicly available the
> >> Haitian Creole spoken and text data that we have collected or produced. We
> >> are providing this data with minimal restrictions in order to
> >> allow others to develop language technology for Haiti, in parallel with our
> >> own efforts to help with this crisis. Since organizing the data in a useful
> >> fashion is not instantaneous, and more text data is currently being 
> >> produced
> >> by collaborators, we will be publishing the data incrementally on the web,
> >> as it becomes available.  To access the currently available data, please
> >> visit the website at  http://www.speech.cs.cmu.edu/haitian/
> >> 
> >
> > Would you consider also dual/triple licensing the data under an existing
> > free software licence, such as the MIT licence[1] or the GNU GPL[2] ?
> > This way it could be combined with existing data under these licences
> > (e.g. the majority of free/open-source software) and researchers and
> > developers don't need to hire legal advice to determine if they can
> > combine their work with yours.
> >
> > Best regards, 
> >
> > Fran
> >
> > 1. http://en.wikipedia.org/wiki/MIT_Licence#License_terms
> > 2. http://www.gnu.org/licenses/gpl.html
> >
> > ___
> > Mt-list mailing list
> >
> >   


___
Mt-list mailing list


[Mt-list] DCU MT GROUP RELEASES FREE/OPEN-SOURCE EBMT SYSTEM ‘MARCLATOR’

2010-01-22 Thread Mikel L. Forcada
DCU MT GROUP RELEASES FREE/OPEN-SOURCE EBMT SYSTEM ‘MARCLATOR’

The Centre for Next Generation’s (CNGL) Machine Translation group, led by Prof.
Andy Way at Dublin City University (DCU), announces the release of ‘Marclator’
(Marker-based Translator), a free/open-source system for Example Based Machine
Translation (EBMT).  This release coincides with the 4th MT Marathon, a
week-long event being hosted January 25th-30th by the CNGL and the National
Centre for Language Technology (NCLT) at DCU in conjunction with the EuroMatrix+
project, where over 100 participants from 20 countries will have a chance to
test and program open-source MT tools and systems.

The Marclator EBMT system release includes a fully functional marker-based
chunker/tagger (based on Green’s “marker hypothesis”) with markers for some
languages and a chunk aligner, as well as a proof-of-concept ‘naïve’ (monotone)
recombination module or ‘decoder’.

This free/open-source release results from collaboration with Prof. Mikel L.
Forcada of Universitat d’Alacant in Spain who is currently a visiting researcher
within the CNGL MT group at DCU through an ETS Walton Award from Science
Foundation Ireland (SFI).

Through SFI funding of the Centre for Next Generation Localisation and
additional funding from EU FP7 research projects currently coming on stream, DCU
now boasts one of the largest academic research groups focused on MT worldwide.
 The Marclator release is seen as a ‘first-step’ in a strategy of participation
in the free/open-source community in parallel with a programme of commercial
engagement with companies interested in adopting, tuning and deploying machine
translation technology.

Over the past number of years, Prof Andy Way has led the MT group at DCU in
pursuing corpus-based approaches to MT, which have culminated in the MaTrEx
system, a modular, maintainable and efficient data-driven machine translation
system which combines example-based machine translation (EBMT) and statistical
machine translation (SMT) and which consistently ranks as one of the
top-performing MT systems in open machine translation evaluations (e.g. WMT-09,
IWSLT-09, etc.).

As a follow-on to the Marclator release, Prof. Way and Prof. Forcada will
continue to collaborate toward a free/open-source release of a baseline MaTrEx
system, combining Marclator with the Moses SMT decoder.  This OpenMaTrEx release
is anticipated for Spring 2010.

Resources:  
http://www.cngl.ie
http://nclt.dcu.ie/mt
http://www.computing.dcu.ie/~mforcada/fosmt.html
http://www.computing.dcu.ie/~mforcada/marclator.html
http://www.euromatrixplus.net/
http://www.mtmarathon2010.info/web/Welcome.html

For more information please contact: i...@cngl.ie



Mikel L. Forcada 
Dept. Llenguatges i Sistemes Informàtics
Universitat d\\\'Alacant, E-03071 Alacant (Spain)
Tel.: +34 96 590 9776Fax: +34 96 590 9326
___
Mt-list mailing list


Re: [Mt-list] Public release of Haitian Creole language data by Carnegie Mellon

2010-01-22 Thread Mikel L. Forcada

Glen, MT-listers:

Glen wrote:

Let's not inhibit folks from doing good by making it hard to do so!
  
I don't think Fran and I are in any way inhibiting anyone from doing 
good. We are just suggesting ways in which good can be done more 
effectively, by allowing the use of these resources with the existing 
free/open-source tools (for a list of free/open-source MT software, see 
http://computing.dcu.ie/~mforcada/fosmt.html). In fact, some of us were 
planning on coordinating some work to build a preliminary 
free/open-source English-Haitian MT system during the 
http://www.mtmarathon2010.info) next week, using available 
free/open-source resources, hence our concerns.


Best,

Mikel L. Forcada
___
Mt-list mailing list