---------- Forwarded message ---------
From: Mcgibbney, Lewis J (172B) <lewis.j.mcgibb...@jpl.nasa.gov>
Date: Fri, Nov 15, 2019 at 12:23 PM
Subject: FW: [EXTERNAL] November 2019 Newsletter - LDC
To: lewis john mcgibbney <lewi...@apache.org>






Dr. Lewis John McGibbney Ph.D., B.Sc.(Hons)

Enterprise Search Technologist

Web and Mobile Application Development Group (172B)

Application, Consulting, Development and Engineering Section (1722)

Info & Engineering Technology Planning and Development Division (1720)

Jet Propulsion Laboratory

California Institute of Technology

4800 Oak Grove Drive

Pasadena, California 91109-8099

Mail Stop : 600-172A

Tel:  (+1) (818)-393-7402

Cell: (+1) (626)-487-3476

Fax:  (+1) (818)-393-1190

Email: lewis.j.mcgibb...@jpl.nasa.gov

ORCID: orcid.org/0000-0003-2185-928X



           [image: signature_971087365]



 Dare Mighty Things



*From: *Ldc-customers1 <ldc-customers1-boun...@ldc.upenn.edu> on behalf of
Penn LDC <l...@ldc.upenn.edu>
*Date: *Friday, November 15, 2019 at 10:38 AM
*To: *Penn LDC <l...@ldc.upenn.edu>
*Subject: *[EXTERNAL] November 2019 Newsletter - LDC





*In this newsletter: *
*Join LDC for Membership Year 2020 Spring 2020 Data Scholarship Program*

* New Publications:*

DEFT English Committed Belief Annotation
<https://catalog.ldc.upenn.edu/LDC2019T16>
CALLFRIEND American English-Non-Southern Dialect Second Edition
<https://catalog.ldc.upenn.edu/LDC2019S21>
TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017
<https://catalog.ldc.upenn.edu/LDC2019T17>
IARPA Babel Amharic Language Pack IARPA-babel307b-v1.0b
<https://catalog.ldc.upenn.edu/LDC2019S22>





*Join LDC for Membership Year 2020 *Membership Year 2020 (MY2020) is open
and discounts are available for those who keep their membership current and
join early in the year. Now through March 2, 2020, current MY2019 members
who renew their LDC membership before March 2 will receive a 10% discount
off the membership fee. New or returning organizations will receive a 5%
discount through March 2.

In addition to receiving new publications, current LDC members also enjoy
the benefit of licensing older data at reduced costs from our Catalog of
over 800 holdings. Current-year for-profit members may use most data for
commercial applications.

Plans for MY2020 publications are in progress. Among the expected releases
are:

·         *Abstract Meaning Representation (AMR) Annotation Release 3.0*:
semantic treebank of over 59,000 English natural language sentences from
broadcast conversations, newswire, weblogs and web discussion forums;
updates the second version (LDC2017T10
<https://catalog.ldc.upenn.edu/LDC2017T10>) with new annotations

·         *TAC KBP:* English sentiment slot filling, surprise slot filling,
nugget detection and coreference, and event argument data in all languages
(English, Chinese and Spanish)

·         *DEFT Chinese ERE:* Chinese discussion forum data annotated for
entities, relations and events

·         *LibriVox Spanish: *73 hours of Spanish audiobook read speech and
transcripts

·         *IARPA Babel Language Packs *(telephone speech and transcripts):
languages include Dhuluo, Javanese and Mongolian

·         *HAVIC Med Training data*: web video, metadata, and annotations
for developing multimedia systems

·         *RATS Speaker Identification:* conversational telephone speech in
Levantine Arabic, Pashto, Urdu, Farsi and Dari on degraded audio signals
with annotation of speech segments for speaker identification

·         *BOLT*: discussion forums, SMS/chat, conversational telephone
speech, word-aligned, tagged and co-reference data in all languages
(Chinese, Egyptian Arabic, and English)


It’s also not too late to join for MY2018 (through December 31, 2019) and
MY2019 (through December 31, 2020). Data sets from those years include
Concretely Annotated New York Times and English Gigaword, DIRHA English WSJ
Audio, BOLT English Treebank – Discussion Forum, First DIHARD Challenge
Development and Evaluation releases, Penn Discourse Treebank Version 3.0,
and 2016 NIST Speaker Recognition Evaluation Test Set.

For full descriptions of all LDC data sets, browse our Catalog
<https://catalog.ldc.upenn.edu/>.

Visit Join LDC <https://www.ldc.upenn.edu/members/join-ldc> for details on
membership, user accounts and payment.


*Spring 2020 Data Scholarship Program *Applications are now being accepted
through January 15, 2020 for the Spring 2020 LDC Data Scholarship program
which provides university students with no-cost access to LDC data. Consult
the LDC Data Scholarship
<https://www.ldc.upenn.edu/language-resources/data/data-scholarships> page
for more information about program rules and submission requirements.

*New publications:*



(1) DEFT English Committed Belief Annotation
<https://catalog.ldc.upenn.edu/LDC2019T16> was developed by LDC and
consists of approximately 950,000 words of English discussion forum text
annotated for "committed belief," which marks the level of commitment
displayed by the author to the truth of the propositions expressed in the
text.

DARPA's Deep Exploration and Filtering of Text (DEFT) program aimed to
address remaining capability gaps in state-of-the-art natural language
processing technologies related to inference, causal relationships, and
anomaly detection. LDC supported the DEFT program by collecting, creating,
and annotating a variety of data sources.

DEFT English Committed Belief Annotation is distributed via web download.

2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data for $2000.

*

(2) CALLFRIEND American English-Non-Southern Dialect Second Edition
<https://catalog.ldc.upenn.edu/LDC2019S21> was developed by LDC and
consists of approximately 26 hours of unscripted telephone conversations
between native speakers of non-Southern dialects of American English. This
second edition updates the audio files to wav format, simplifies the
directory structure, and adds documentation and metadata. The first edition
is available as CALLFRIEND American English-Non-Southern Dialect (LDC96S46
<https://catalog.ldc.upenn.edu/LDC96S46>).

All data was collected before July 1997. Participants could speak with a
person of their choice on any topic; most called family members and
friends. All calls originated in North America. The recorded conversations
last up to 30 minutes.

CALLFRIEND American English-Non-Southern Dialect Second Edition is
distributed via web download.

2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data for $1000.


*

(3) TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017
<https://catalog.ldc.upenn.edu/LDC2019T17> was developed by LDC and
contains Chinese, English, and Spanish data produced in support of the TAC
KBP Cold Start evaluation track conducted from 2012
<https://tac.nist.gov/2012/KBP/index.html> to 2017
<https://tac.nist.gov/2017/KBP/ColdStart/index.html>. This corpus includes
source documents, queries, assessments, manual runs, and final assessments.

In the Cold Start track, systems were evaluated on their ability to
construct a new knowledge base (KB) from information provided in a text
collection in combination with technologies developed in other TAC KBP
tracks -- slot filling, information extraction, question answering, and
entity discovery and linking. Cold Start systems were required to find all
entities in the text, and the KB must have ideally included every person,
organization, and geo-political entity as well as all the targeted
relations between them. To facilitate the evaluation of those KBs, LDC
annotators created sets of queries, human-generated responses to the
queries, and assessments of both human and system responses.

The source data in this release is comprised of English and Spanish
newswire and web text collected by LDC for the 2012, 2014, and 2015
evaluations, and the 2016 pilot collection. The source collections for the
2016 and 2017 evaluations, which include Chinese data, are available in TAC
KBP Evaluation Source Corpora 2016-2017 (LDC2019T12
<https://catalog.ldc.upenn.edu/LDC2019T12>). The archived 2013 Cold Start
source data collection is available from NIST upon request.

TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017 is distributed
via web download.

2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data for $1000.

*

(4) IARPA Babel Amharic Language Pack IARPA-babel307b-v1.0b
<https://catalog.ldc.upenn.edu/LDC2019S22> was developed by Appen
<http://www.appen.com/> for the IARPA (Intelligence Advanced Research
Projects Activity) Babel
<http://www.iarpa.gov/index.php/research-programs/babel> program. It
contains approximately 204 hours of Amharic conversational and scripted
telephone speech collected in 2014 along with corresponding transcripts.

The Amharic speech in this release represents the Addis Ababa, Shewa, and
Gondar dialect regions of Ethiopia. The gender distribution among speakers
is approximately equal; speakers' ages range from 16 years to 60 years.
Calls were made using different telephones (e.g., mobile, landline) from a
variety of environments including the street, a home or office, a public
place, and inside a vehicle.

IARPA Babel Amharic Language Pack IARPA-babel307b-v1.0b is distributed via
web download.

2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data for $25.



Membership Office

Linguistic Data Consortium <http://ldc.upenn.edu>

University of Pennsylvania

T: +1-215-573-1275

E: l...@ldc.upenn.edu

M: 3600 Market St. Suite 810

      Philadelphia, PA 19104














-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc

Reply via email to