Re: [Mt-list] Public release of Haitian Creole language data byCarnegie Mellon

2010-01-23 Thread Robert Frederking
Well, my understanding is that, unfortunately, most companies won't 
touch anything that's under GPL, so I don't think that's a solution.  We 
don't want to exclude commercial entities.


   Bob

Francis Tyers wrote:

First of all, thanks to CMU for releasing the data. I've no doubt it
will be valuable to people working in the field.

I don't particularly like terms like lawyerbomb and obnoxious
advertising clause, but this merits a response.

People who don't get paid to work on the software they develop, aren't
employed by big universities or companies are understandably concerned
about getting sued -- you can say but they've never been sued before,
so why should they worry -- but this isn't really convincing. They can
get frustrated that people make more work for themselves and others.

* Making up your own 'free/open-source' licence: 
More work for you, more work for them.


* Choosing an existing tried and tested 'free/open-source' licence: 
Less work for you, less work for them.


Furthermore, they can also find it frustrating that a non-profit
organisation would release their work under a licence that is
incompatible with that of over 60% of free software.[1]

Fran

PS. Some of these same issues are reviewed in Ted Pedersen's excellent
2008 article:
http://www.d.umn.edu/~tpederse/Pubs/pedersen-last-word-2008.pdf

=Notes=

1. http://www.blackducksoftware.com/oss/licenses#top20

El dv 22 de 01 de 2010 a les 18:29 -0500, en/na Job M. van Zuijlen va
escriure:
  

Some of the verbiage used in this discussion (lawyer bomb...) doesn't
particularly encourage people to make their data freely available.
What happened to common sense?  I think CMU's initiative should be
commended.
 
Job van Zuijlen



From: Robert Frederking 
Sent: Friday, January 22, 2010 16:32
To: Francis Tyers 
Cc: mt-list@eamt.org 
Subject: Re: [Mt-list] Public release of Haitian Creole language data

byCarnegie Mellon


I'm not a lawyer, but let me start by stating that out intent was
simply that re-use included acknowledgement.  This was not intended to
be a splash-screen on every start-up, or making the software pronounce
our names at the start of every sentence.  :-)  It only has to be
clearly visible in anyone's source files.

We aren't interested in suing people; we are a non-profit research
organization.  But like the Regents in California, we have a
responsibility to our sponsors that appropriate credit is given for
our work.  So this is intended to be like the old BSD advertising
clause, which is generally considered to be clear from a legal point
of view. 


Please use the data however you want; just don't say you originally
collected it.

Bob

Francis Tyers wrote: 


[ Sorry in advance for cross posting ]

I'm going over this on the debian-legal mailing list (a good place to
ask about issues in free/open-source software licensing).

There is a question about clause 5 of the licence:



##  5. Any commercial, public or published work that uses this data
##
## must contain a clearly visible acknowledgment as to the   ##
## provenance of the data.   ##



From debian-legal:

 My concern is whether, contrary to the favourable interpretation you
 give, this is intended to act like an obnoxious advertising clause.

 In other words, what will satisfy “contain” in “contain a clearly
 visible acknowledgement”? Is it sufficient for the acknowledgement to  
 be “clearly visible” only after inspecting various files in the source

 code?

 Or is the copyright holder's intent that the acknowledgement be clearly
 visible to every recipient, even those who receive a non-source form of
 the work? The latter would be a non-free restriction, like the  
 obnoxious advertising clause in the older BSD licenses.


 This looks, as it is currently worded, more like a lawyerbomb now that 
 I consider it. I would appreciate input on this from legally-trained  
 minds.




Could you confirm if that clause means that the acknowledgement should
be _clearly visible_ to _every recipient_ or would it suffice to be
visible after inspecting the source code?

Thanks for your help in this and best regards,

Francis Tyers


El dj 21 de 01 de 2010 a les 22:59 -0500, en/na Alon Lavie va escriure:
  
  

Hi Francis,

Thanks for the suggestion, but we were advised to leave the licensing 
language as is.  Our licensing language is effectively equivalent to the 
MIT license.and is unambiguous with respect to releasing the data for 
any use (commercial or non-commercial).


Best regards,

- *Alon*

Francis Tyers wrote:



El dj 21 de 01 de 2010 a les 14:49 -0500, en/na Robert Frederking va
escriure:
  
  
  

The Language Technologies Institute (LTI

Re: [Mt-list] Public release of Haitian Creole language data byCarnegie Mellon

2010-01-21 Thread Vadim Berman

Hi Robert,

These are commendable efforts, but isn't French the principal written 
language in Haiti? Or you are talking about a speech to speech system?


Best regards,
Vadim

- Original Message - 
From: Robert Frederking r...@cs.cmu.edu

To: mt_l...@nist.gov; mt-list@eamt.org
Sent: Friday, January 22, 2010 6:49 AM
Subject: [Mt-list] Public release of Haitian Creole language data byCarnegie 
Mellon




The Language Technologies Institute (LTI) of Carnegie Mellon University's
School of Computer Science (CMU SCS) is making publicly available the
Haitian Creole spoken and text data that we have collected or produced. We
are providing this data with minimal restrictions in order to
allow others to develop language technology for Haiti, in parallel with 
our
own efforts to help with this crisis. Since organizing the data in a 
useful
fashion is not instantaneous, and more text data is currently being 
produced

by collaborators, we will be publishing the data incrementally on the web,
as it becomes available.  To access the currently available data, please
visit the website at  http://www.speech.cs.cmu.edu/haitian/

___
Mt-list mailing list 




___
Mt-list mailing list


Re: [Mt-list] Public release of Haitian Creole language data byCarnegie Mellon

2010-01-21 Thread Robert Frederking

Hi Vadim,
   Yes, French is the principal written language, but most of the 
population only speaks Creole (and is illiterate).  We ourselves are 
indeed looking at making speech-based systems (and the rarest part of 
the data may be the speech data).  There may also be unforeseen benefits 
to the data being available.  For example, it appears that Doctors 
Without Borders (Médecins Sans Frontières) may make use of the bilingual 
medical phrases as-is, through Translators Without Borders (Traducteurs 
sans Frontières).  So who knows how this may help.  Cheers.


   Bob
//
Vadim Berman wrote:

Hi Robert,

These are commendable efforts, but isn't French the principal written 
language in Haiti? Or you are talking about a speech to speech system?


Best regards,
Vadim

- Original Message - From: Robert Frederking r...@cs.cmu.edu
To: mt_l...@nist.gov; mt-list@eamt.org
Sent: Friday, January 22, 2010 6:49 AM
Subject: [Mt-list] Public release of Haitian Creole language data 
byCarnegie Mellon



The Language Technologies Institute (LTI) of Carnegie Mellon 
University's

School of Computer Science (CMU SCS) is making publicly available the
Haitian Creole spoken and text data that we have collected or 
produced. We

are providing this data with minimal restrictions in order to
allow others to develop language technology for Haiti, in parallel 
with our
own efforts to help with this crisis. Since organizing the data in a 
useful
fashion is not instantaneous, and more text data is currently being 
produced
by collaborators, we will be publishing the data incrementally on the 
web,

as it becomes available.  To access the currently available data, please
visit the website at  http://www.speech.cs.cmu.edu/haitian/

___
Mt-list mailing list 





___
Mt-list mailing list


Re: [Mt-list] Public release of Haitian Creole language data byCarnegie Mellon

2010-01-21 Thread Jeff Allen
Sorry I couldn't intervene quickly enough guys.  Dealing with lots of requests
from people about the announcement.

Bob has mentioned several of the things we are looking at right now, and
concrete ways to meet specific needs.  I'm on several networks with many
translation agencies and associations of translators/interpreters, with
confcalls and email communication on a very regular basis.  I am corresponding
with many Haitians who want to translate and do interpretation for their fellow
countrymen.  The flow of requests for translation from/into so many languages
for this crisis is astounding.   Current estimations is that a potential of
40,000 NGOs will be involved in assisting in this disaster relief effort. I'm
hearing now of teams being deployed from Brazil, Bolivia and the Philippines.
Cargo ships are coming from Italy now as well.

The translation and interpretation needs vary based on type of content, the
intended speaker and receiver of the message, means of communication. It is not
a one-size-fits-all communication need right now.

As for the languages in Haiti.  Well, many of my publications concerning Haitian
creole language and technologies (https://www.box.net/shared/bz4sq9jx88) also
include descriptions of the sociolinguistic factors which affect the approach
and means of implementing technologies.

Both French and Haitian Creole are official languages in Haiti, both spoken and
written forms.  The IPN spelling system for Haitian Creole adopted in 1979 was
in fact an Orthography Law.  Experts have documented 10+ different spelling
systems over the history of the language, and that does not include the hybrid
forms.
The majority of the Haitian population is illiterate. Many reports up to the end
of the 90s said that it was 80-90% illiteracy in any language.
When I was on a trip for CMU in 1998, all of the students we were recording for
the speech data could read and write (as well as speak) in Haitian Creole very
well, compared with the reading level of Haitians of the diaspora whom I had
recorded in 1997 on other trips in the US and France.

Communication with regard to crowd control, medical treatment and other areas is
much more effective into Haitian Creole. Other types of communication between
the participating NGOs and other organizations would be from/to French.  Much
depends on the purpose of the communication and the participants.   In this time
of psychological and physical trauma, it is much more effective to speak to the
Haitians in their original mother tongue (Haitian Creole). A small part of the
population does grow up only hearing French at home, and many speak Creole until
they start going to school and then learn French, just as is the case in various
Africans countries.

It is sad that it took a disaster for Haitian Creole to receive so much
recognition as a language.

I'm not subscribed to mt_l...@nist.gov, so can someone please repost this to
that list, since it will certainly rebounce to me.

Jeff

http://www.linkedin.com/in/jeffallen

==
Quoting Robert Frederking r...@cs.cmu.edu:
 Hi Vadim,
 Yes, French is the principal written language, but most of the
 population only speaks Creole (and is illiterate).  We ourselves are
 indeed looking at making speech-based systems (and the rarest part of
 the data may be the speech data).  There may also be unforeseen benefits
 to the data being available.  For example, it appears that Doctors
 Without Borders (Médecins Sans Frontières) may make use of the bilingual
 medical phrases as-is, through Translators Without Borders (Traducteurs
 sans Frontières).  So who knows how this may help.  Cheers.

 Bob
 //
 Vadim Berman wrote:
  Hi Robert,
 
  These are commendable efforts, but isn't French the principal written
  language in Haiti? Or you are talking about a speech to speech system?
 
  Best regards,
  Vadim
 
  - Original Message - From: Robert Frederking r...@cs.cmu.edu
  To: mt_l...@nist.gov; mt-list@eamt.org
  Sent: Friday, January 22, 2010 6:49 AM
  Subject: [Mt-list] Public release of Haitian Creole language data
  byCarnegie Mellon
 
 
  The Language Technologies Institute (LTI) of Carnegie Mellon
  University's
  School of Computer Science (CMU SCS) is making publicly available the
  Haitian Creole spoken and text data that we have collected or
  produced. We
  are providing this data with minimal restrictions in order to
  allow others to develop language technology for Haiti, in parallel
  with our
  own efforts to help with this crisis. Since organizing the data in a
  useful
  fashion is not instantaneous, and more text data is currently being
  produced
  by collaborators, we will be publishing the data incrementally on the
  web,
  as it becomes available.  To access the currently available data, please
  visit the website at  http://www.speech.cs.cmu.edu/haitian/
 
  ___
  Mt-list mailing list
 
___
Mt-list