Fwd: FW: October 2016 Newsletter -- LDC

2016-10-19 Thread lewis john mcgibbney
Hi Folks,
For anyone with access to LDC, it looks like there could be a really cool
Chinese --> English parallel sentence dataset.
Once I've finished my current batch of work (Russian) I'm going to have a
look at the dataset.
Lewis

-- Forwarded message --
From: Mcgibbney, Lewis J (398M) 
Date: Wed, Oct 19, 2016 at 2:15 PM
Subject: FW: October 2016 Newsletter -- LDC
To: "lewis.mcgibb...@gmail.com" 






Dr. Lewis John McGibbney Ph.D., B.Sc.

Data Scientist II

Computer Science for Data Intensive Applications Group 398M

Jet Propulsion Laboratory

California Institute of Technology

4800 Oak Grove Drive

Pasadena, California 91109-8099

Mail Stop : 158-256C

Tel:  (+1) (818)-393-7402

Cell: (+1) (626)-487-3476

Fax:  (+1) (818)-393-1190

Email: lewis.j.mcgibb...@jpl.nasa.gov







 Dare Mighty Things



*From: *Ldc-customers1  on behalf of
Penn LDC 
*Date: *Wednesday, October 19, 2016 at 6:32 AM
*To: *Penn LDC 
*Subject: *October 2016 Newsletter -- LDC



October 2016 Newsletter – LDC

*In this newsletter:*

*Fall 2016 LDC Data Scholarship recipients*

*Chilin HK and LDC partner on distribution of parallel patent data*

*New publications:*

IARPA Babel Turkish Language Pack IARPA-babel105b-v0.5




KAFD: Arabic Font Database 



Richer Event Description 





*Fall 2016 LDC Data Scholarship recipients*

Congratulations to the recipients of LDC's Fall 2016 data scholarships:

Tiba Zaki Abdulhameed: Western Michigan University (USA); PhD Candidate,
Computer Science. Tiba is awarded copies of GALE Phase 2 Arabic Broadcast
Conversation Speech and Transcripts for her research in dialectal ASR.

Abhishek Abhishek: Indian Institute of Technology Guwahati (India); PhD
Candidate, Computer Science and Engineering. Abhishek is awarded a copies
of ACE 2004 Multilingual Training Corpus and The New York Times Annotated
Corpus for his research in coreference resolution and relation extraction.

Sara Ebrahim: Ain Shams University (Egypt); Msc, Computer Science. Sara is
awarded copies of LDC Standard Arabic Morphological Analyzer and NIST
OpenMT 2008 Evaluation Selected References and System Translations for her
work in machine translation.

Katherine Metcalf: Indiana University (USA), PhD Candidate, Computer
Science. Katherine is awarded a copy of Emotional Prosody Speech and
Transcripts for her research in acoustic/prosodic approaches to classifying
emotional states.

Mousmita Sarma: Gauhati University (India), Post-Masters Research,
Electronics and Communications Technology. Mousmita is awarded copies of
Switchboard 1-Release 2 and IARPA Babel Assamese Language Pack for her
research in Assamese dialect identification.

For program information visit the Data Scholarship page
.



*Chilin HK and LDC partner on distribution of parallel patent data*

Chilin HK Limited (Chilin) and LDC are pleased to announce that the
parallel data source developed by Chilin, A Corpus of Chinese-English
Parallel Sentences Extracted from Patents, is now available through the LDC
Catalog. This is a special release in addition to the LDC scheduled corpora
for membership year 2016, available under separate terms.

The Chilin Corpus has primarily resulted from training corpus and test sets
developed specifically for the Tokyo-based NTCIR 2009 & 2010 competitions
on Patent MT (machine translation), which drew more than 30 international
teams:

NTCIR-9: http://research.nii.ac.jp/ntcir/workshop/
OnlineProceedings9/NTCIR/01-NTCIR9-PATENTMT-GotoI.pdf

NTCIR-10: http://research.nii.ac.jp/ntcir/workshop/
OnlineProceedings10/pdf/NTCIR/OVERVIEW/01-NTCIR10-PATENTMT-GotoI.pdf

The training corpus is drawn from a much larger curated corpus of parallel
Chinese-English sentences and sentence fragments which have been winnowed
from an even larger corpus of more than 300k parallel Chinese-English
patents in different fields, initially at the Research Centre on Language
Information Sciences, City University of Hong Kong (authors:  Benjamin
Tsou, Bin Lu, and Kapo Chow). This data set is available from LDC under the
following reference:

LDC2016T22   A Corpus of Chinese-English Parallel Sentences Extracted from
Patents 

Not-for-profit organizations may license this data set for US$25.00 under
the LDC Not-for-Profit Membership Agreement or under the LDC User Agreement
for Non-Members for use in linguistic research, education and
non-commercial technology development. For-profit organizations may license
this data for US$5000, discounted to US$4000 for LDC for-profit members,
under a commercial license.

*New Corpora*

(1) IARPA Babel Turkish Language Pack IARPA-babel105b-v0.5

Re: Joshua 6.1

2016-10-19 Thread Tommaso Teofili
Il giorno mer 19 ott 2016 alle ore 14:59 Matt Post  ha
scritto:

> - I'm happy to release the language packs with an Apache 2.0 license.
>

glad to hear this :)


>
> - It looks like there's quite a bit of paperwork involved with the
> release. Is anyone available to help out with this or even head it up?
>

i think I can help


>
> - We ran into a hitch building language packs, but have resumed and most
> of them are almost done. We should have over 60.
>

awesome news!


>
> - Meanwhile the world is changing and the neural approach is becoming more
> and more obviously the right thing to do. I have some ideas on how this
> fits in to Joshua which I'll send out in another email.
>

looking forward to it.

Regards,
Tommaso


>
> matt
>
>
>
> > On Oct 16, 2016, at 4:40 PM, lewis john mcgibbney 
> wrote:
> >
> > Hi Matt,
> > I like the sound of this :)
> >
> > On Fri, Oct 14, 2016 at 9:25 AM, <
> > dev-digest-h...@joshua.incubator.apache.org> wrote:
> >
> >>
> >> From: Matt Post 
> >> To: dev@joshua.incubator.apache.org
> >> Cc:
> >> Date: Thu, 13 Oct 2016 12:58:47 -0400
> >> Subject: Joshua 6.1
> >> Hi folks,
> >>
> >> I think I'm going to do the 6.1 release tomorrow. Any objections?
> >>
> >
> > No none at all!
> >
> >
> >>
> >> Along with the release will be about 60 language packs for a large range
> >> of languages. These will be released early next week and will be built
> on
> >> BerkeleyLM, so that there are no external dependencies.
> >>
> >
> > Sounds grand. As stated, it would be really cool if these could also be
> > ALv2.0 licensed.
> >
> >
> >>
> >> I'd like to push out the release quietly until the language packs are
> >> ready, uploaded, and linked.
> >>
> >
> > Cool.
> >
> >
> >>
> >> Is there anything I need to know to do an Apache release?
> >>
> >>
> > Yes a few things. You can see the incubator release checklist at
> > http://incubator.apache.org/guides/releasemanagement.html#check-list
> > There is also some more general documentation available at
> > http://incubator.apache.org/guides/graduation.html#releases, which will
> > eventually lead you to the release check list anyways.
> > If you have any issues then lets hash them out on this thread. Please
> note
> > that we need to review and VOTE prior to anything being pushed. We then
> > need to go to the Incubator PMC to get wider approval before shipping the
> > release. This 'can' be a bit painful... however from experience, if we 1)
> > document the release management procedure on our wiki, and 2) iron out
> any
> > issues within dev@joshua before we go to general@incubator then I am
> sure
> > we will not encounter too many issues.
> > Lewis
>
>


Re: Joshua 6.1

2016-10-19 Thread Matt Post
- I'm happy to release the language packs with an Apache 2.0 license.

- It looks like there's quite a bit of paperwork involved with the release. Is 
anyone available to help out with this or even head it up?

- We ran into a hitch building language packs, but have resumed and most of 
them are almost done. We should have over 60.

- Meanwhile the world is changing and the neural approach is becoming more and 
more obviously the right thing to do. I have some ideas on how this fits in to 
Joshua which I'll send out in another email.

matt



> On Oct 16, 2016, at 4:40 PM, lewis john mcgibbney  wrote:
> 
> Hi Matt,
> I like the sound of this :)
> 
> On Fri, Oct 14, 2016 at 9:25 AM, <
> dev-digest-h...@joshua.incubator.apache.org> wrote:
> 
>> 
>> From: Matt Post 
>> To: dev@joshua.incubator.apache.org
>> Cc:
>> Date: Thu, 13 Oct 2016 12:58:47 -0400
>> Subject: Joshua 6.1
>> Hi folks,
>> 
>> I think I'm going to do the 6.1 release tomorrow. Any objections?
>> 
> 
> No none at all!
> 
> 
>> 
>> Along with the release will be about 60 language packs for a large range
>> of languages. These will be released early next week and will be built on
>> BerkeleyLM, so that there are no external dependencies.
>> 
> 
> Sounds grand. As stated, it would be really cool if these could also be
> ALv2.0 licensed.
> 
> 
>> 
>> I'd like to push out the release quietly until the language packs are
>> ready, uploaded, and linked.
>> 
> 
> Cool.
> 
> 
>> 
>> Is there anything I need to know to do an Apache release?
>> 
>> 
> Yes a few things. You can see the incubator release checklist at
> http://incubator.apache.org/guides/releasemanagement.html#check-list
> There is also some more general documentation available at
> http://incubator.apache.org/guides/graduation.html#releases, which will
> eventually lead you to the release check list anyways.
> If you have any issues then lets hash them out on this thread. Please note
> that we need to review and VOTE prior to anything being pushed. We then
> need to go to the Incubator PMC to get wider approval before shipping the
> release. This 'can' be a bit painful... however from experience, if we 1)
> document the release management procedure on our wiki, and 2) iron out any
> issues within dev@joshua before we go to general@incubator then I am sure
> we will not encounter too many issues.
> Lewis