Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-02-12 Thread Lewis John Mcgibbney
Hi Chris,
Is it time to close out this VOTE and bring Joshua on board?
Lewis

On Wed, Feb 3, 2016 at 4:01 PM, <general-digest-h...@incubator.apache.org>
wrote:

>
> From: Danese Cooper <dan...@gmail.com>
> To: "general@incubator.apache.org" <general@incubator.apache.org>
> Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
> Date: Wed, 3 Feb 2016 07:43:11 -0800
> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling
> +1 (binding) Accept Joshua as an Apache Incubator podling.
>
> D
>
> > On Jan 30, 2016, at 12:00 PM, Mattmann, Chris A (3980) <
> chris.a.mattm...@jpl.nasa.gov> wrote:
> >
> > Hi Everyone,
> >
> > OK the discussion is now completed. Please VOTE to accept Joshua
> > into the Apache Incubator. I’ll leave the VOTE open for at least
> > the next 72 hours, with hopes to close it next Friday the 5th of
> > February, 2016.
> >
> > [ ] +1 Accept Joshua as an Apache Incubator podling.
> > [ ] +0 Abstain.
> > [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
> >
> > Of course, I am +1 on this. Please note VOTEs from Incubator PMC
> > members are binding but all are welcome to VOTE!
> >
> > Cheers,
> > Chris
> >
> > ++
> > Chris Mattmann, Ph.D.
> > Chief Architect
> > Instrument Software and Science Data Systems Section (398)
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 168-519, Mailstop: 168-527
> > Email: chris.a.mattm...@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > ++
> > Adjunct Associate Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++
> >
> >
> >
> >
> >
> > -Original Message-
> > From: jpluser <chris.a.mattm...@jpl.nasa.gov>
> > Date: Tuesday, January 12, 2016 at 10:56 PM
> > To: "general@incubator.apache.org" <general@incubator.apache.org>
> > Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
> > Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine Translation
> > Toolkit
> >
> >> Hi Everyone,
> >>
> >> Please find attached for your viewing pleasure a proposed new project,
> >> Apache Joshua, a statistical machine translation toolkit. The proposal
> >> is in wiki draft form at:
> https://wiki.apache.org/incubator/JoshuaProposal
> >>
> >> Proposal text is copied below. I’ll leave the discussion open for a
> week
> >> and we are interested in folks who would like to be initial committers
> >> and mentors. Please discuss here on the thread.
> >>
> >> Thanks!
> >>
> >> Cheers,
> >> Chris (Champion)
> >>
> >> ———
> >>
> >> = Joshua Proposal =
> >>
> >> == Abstract ==
> >> [[joshua-decoder.org|Joshua]] is an open-source statistical machine
> >> translation toolkit. It includes a Java-based decoder for translating
> with
> >> phrase-based, hierarchical, and syntax-based translation models, a
> >> Hadoop-based grammar extractor (Thrax), and an extensive set of tools
> and
> >> scripts for training and evaluating new models from parallel text.
> >>
> >> == Proposal ==
> >> Joshua is a state of the art statistical machine translation system that
> >> provides a number of features:
> >>
> >> * Support for the two main paradigms in statistical machine translation:
> >> phrase-based and hierarchical / syntactic.
> >> * A sparse feature API that makes it easy to add new feature templates
> >> supporting millions of features
> >> * Native implementations of many tuners (MERT, MIRA, PRO, and AdaGrad)
> >> * Support for lattice decoding, allowing upstream NLP tools to expose
> >> their hypothesis space to the MT system
> >> * An efficient representation for models, allowing for quick loading of
> >> multi-gigabyte model files
> >> * Fast decoding speed (on par with Moses and mtplz)
> >> * Language packs — precompiled models that allow the decoder to be
> run as
> >> a black box
> >> * Thrax, a Hadoop-based tool for learning translation models from
> >> parallel text
> >> * A suite of tools for constructing new models for any language pair for
> >> which sufficient training data exists

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-02-12 Thread Mattmann, Chris A (3980)
Yep, will send a result shortly.

Lewis, after that, can you help me get the podling bootstrap tasks
started?

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++





-Original Message-
From: Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>
Date: Friday, February 12, 2016 at 11:31 AM
To: "general@incubator.apache.org" <general@incubator.apache.org>
Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling

>Hi Chris,
>Is it time to close out this VOTE and bring Joshua on board?
>Lewis
>
>On Wed, Feb 3, 2016 at 4:01 PM, <general-digest-h...@incubator.apache.org>
>wrote:
>
>>
>> From: Danese Cooper <dan...@gmail.com>
>> To: "general@incubator.apache.org" <general@incubator.apache.org>
>> Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
>> Date: Wed, 3 Feb 2016 07:43:11 -0800
>> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling
>> +1 (binding) Accept Joshua as an Apache Incubator podling.
>>
>> D
>>
>> > On Jan 30, 2016, at 12:00 PM, Mattmann, Chris A (3980) <
>> chris.a.mattm...@jpl.nasa.gov> wrote:
>> >
>> > Hi Everyone,
>> >
>> > OK the discussion is now completed. Please VOTE to accept Joshua
>> > into the Apache Incubator. I’ll leave the VOTE open for at least
>> > the next 72 hours, with hopes to close it next Friday the 5th of
>> > February, 2016.
>> >
>> > [ ] +1 Accept Joshua as an Apache Incubator podling.
>> > [ ] +0 Abstain.
>> > [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
>> >
>> > Of course, I am +1 on this. Please note VOTEs from Incubator PMC
>> > members are binding but all are welcome to VOTE!
>> >
>> > Cheers,
>> > Chris
>> >
>> > ++
>> > Chris Mattmann, Ph.D.
>> > Chief Architect
>> > Instrument Software and Science Data Systems Section (398)
>> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> > Office: 168-519, Mailstop: 168-527
>> > Email: chris.a.mattm...@nasa.gov
>> > WWW:  http://sunset.usc.edu/~mattmann/
>> > ++
>> > Adjunct Associate Professor, Computer Science Department
>> > University of Southern California, Los Angeles, CA 90089 USA
>> > ++
>> >
>> >
>> >
>> >
>> >
>> > -Original Message-
>> > From: jpluser <chris.a.mattm...@jpl.nasa.gov>
>> > Date: Tuesday, January 12, 2016 at 10:56 PM
>> > To: "general@incubator.apache.org" <general@incubator.apache.org>
>> > Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
>> > Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine
>>Translation
>> > Toolkit
>> >
>> >> Hi Everyone,
>> >>
>> >> Please find attached for your viewing pleasure a proposed new
>>project,
>> >> Apache Joshua, a statistical machine translation toolkit. The
>>proposal
>> >> is in wiki draft form at:
>> https://wiki.apache.org/incubator/JoshuaProposal
>> >>
>> >> Proposal text is copied below. I’ll leave the discussion open for a
>> week
>> >> and we are interested in folks who would like to be initial
>>committers
>> >> and mentors. Please discuss here on the thread.
>> >>
>> >> Thanks!
>> >>
>> >> Cheers,
>> >> Chris (Champion)
>> >>
>> >> ———
>> >>
>> >> = Joshua Proposal =
>> >>
>> >> == Abstract ==
>> >> [[joshua-decoder.org|Joshua]] is an open-source statistical machine
>> >> translation toolkit. It includes a Java-based decoder for translating
>> with
>> >> phrase-based, hierarchical, and syntax-base

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-02-12 Thread Tom Barber
You're making the presumption its passed its vote! ;)

On Fri, Feb 12, 2016 at 7:33 PM, Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Yep, will send a result shortly.
>
> Lewis, after that, can you help me get the podling bootstrap tasks
> started?
>
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
>
> -Original Message-
> From: Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
> Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>
> Date: Friday, February 12, 2016 at 11:31 AM
> To: "general@incubator.apache.org" <general@incubator.apache.org>
> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling
>
> >Hi Chris,
> >Is it time to close out this VOTE and bring Joshua on board?
> >Lewis
> >
> >On Wed, Feb 3, 2016 at 4:01 PM, <general-digest-h...@incubator.apache.org
> >
> >wrote:
> >
> >>
> >> From: Danese Cooper <dan...@gmail.com>
> >> To: "general@incubator.apache.org" <general@incubator.apache.org>
> >> Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
> >> Date: Wed, 3 Feb 2016 07:43:11 -0800
> >> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling
> >> +1 (binding) Accept Joshua as an Apache Incubator podling.
> >>
> >> D
> >>
> >> > On Jan 30, 2016, at 12:00 PM, Mattmann, Chris A (3980) <
> >> chris.a.mattm...@jpl.nasa.gov> wrote:
> >> >
> >> > Hi Everyone,
> >> >
> >> > OK the discussion is now completed. Please VOTE to accept Joshua
> >> > into the Apache Incubator. I’ll leave the VOTE open for at least
> >> > the next 72 hours, with hopes to close it next Friday the 5th of
> >> > February, 2016.
> >> >
> >> > [ ] +1 Accept Joshua as an Apache Incubator podling.
> >> > [ ] +0 Abstain.
> >> > [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
> >> >
> >> > Of course, I am +1 on this. Please note VOTEs from Incubator PMC
> >> > members are binding but all are welcome to VOTE!
> >> >
> >> > Cheers,
> >> > Chris
> >> >
> >> > ++
> >> > Chris Mattmann, Ph.D.
> >> > Chief Architect
> >> > Instrument Software and Science Data Systems Section (398)
> >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> > Office: 168-519, Mailstop: 168-527
> >> > Email: chris.a.mattm...@nasa.gov
> >> > WWW:  http://sunset.usc.edu/~mattmann/
> >> > ++
> >> > Adjunct Associate Professor, Computer Science Department
> >> > University of Southern California, Los Angeles, CA 90089 USA
> >> > ++
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > -Original Message-
> >> > From: jpluser <chris.a.mattm...@jpl.nasa.gov>
> >> > Date: Tuesday, January 12, 2016 at 10:56 PM
> >> > To: "general@incubator.apache.org" <general@incubator.apache.org>
> >> > Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
> >> > Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine
> >>Translation
> >> > Toolkit
> >> >
> >> >> Hi Everyone,
> >> >>
> >> >> Please find attached for your viewing pleasure a proposed new
> >>project,
> >> >> Apache Joshua, a statistical machine translation toolkit. The
> >>proposal
> >> >> is in wiki draft form at:
> >> https://wiki.apache.org/incubator/JoshuaProposal
> >> >>
> >> >> Proposal text is copied below. I’ll leave the discussion open for a
> >> week
> >> >&g

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-02-12 Thread Lewis John Mcgibbney
ACK

On Fri, Feb 12, 2016 at 11:33 AM, 
wrote:

>
>
> Lewis, after that, can you help me get the podling bootstrap tasks
> started?
>
>


Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-02-12 Thread Adunuthula, Seshu
Is there a fail grade? ;)


On 2/12/16, 11:57 AM, "Tom Barber" <tom.bar...@meteorite.bi> wrote:

>You're making the presumption its passed its vote! ;)
>
>On Fri, Feb 12, 2016 at 7:33 PM, Mattmann, Chris A (3980) <
>chris.a.mattm...@jpl.nasa.gov> wrote:
>
>> Yep, will send a result shortly.
>>
>> Lewis, after that, can you help me get the podling bootstrap tasks
>> started?
>>
>> ++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattm...@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++
>>
>>
>>
>>
>>
>> -Original Message-
>> From: Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
>> Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>
>> Date: Friday, February 12, 2016 at 11:31 AM
>> To: "general@incubator.apache.org" <general@incubator.apache.org>
>> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling
>>
>> >Hi Chris,
>> >Is it time to close out this VOTE and bring Joshua on board?
>> >Lewis
>> >
>> >On Wed, Feb 3, 2016 at 4:01 PM,
>><general-digest-h...@incubator.apache.org
>> >
>> >wrote:
>> >
>> >>
>> >> From: Danese Cooper <dan...@gmail.com>
>> >> To: "general@incubator.apache.org" <general@incubator.apache.org>
>> >> Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
>> >> Date: Wed, 3 Feb 2016 07:43:11 -0800
>> >> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling
>> >> +1 (binding) Accept Joshua as an Apache Incubator podling.
>> >>
>> >> D
>> >>
>> >> > On Jan 30, 2016, at 12:00 PM, Mattmann, Chris A (3980) <
>> >> chris.a.mattm...@jpl.nasa.gov> wrote:
>> >> >
>> >> > Hi Everyone,
>> >> >
>> >> > OK the discussion is now completed. Please VOTE to accept Joshua
>> >> > into the Apache Incubator. I’ll leave the VOTE open for at least
>> >> > the next 72 hours, with hopes to close it next Friday the 5th of
>> >> > February, 2016.
>> >> >
>> >> > [ ] +1 Accept Joshua as an Apache Incubator podling.
>> >> > [ ] +0 Abstain.
>> >> > [ ] -1 Don’t accept Joshua as an Apache Incubator podling
>>because..
>> >> >
>> >> > Of course, I am +1 on this. Please note VOTEs from Incubator PMC
>> >> > members are binding but all are welcome to VOTE!
>> >> >
>> >> > Cheers,
>> >> > Chris
>> >> >
>> >> > ++
>> >> > Chris Mattmann, Ph.D.
>> >> > Chief Architect
>> >> > Instrument Software and Science Data Systems Section (398)
>> >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >> > Office: 168-519, Mailstop: 168-527
>> >> > Email: chris.a.mattm...@nasa.gov
>> >> > WWW:  http://sunset.usc.edu/~mattmann/
>> >> > ++
>> >> > Adjunct Associate Professor, Computer Science Department
>> >> > University of Southern California, Los Angeles, CA 90089 USA
>> >> > ++
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > -Original Message-
>> >> > From: jpluser <chris.a.mattm...@jpl.nasa.gov>
>> >> > Date: Tuesday, January 12, 2016 at 10:56 PM
>> >> > To: "general@incubator.apache.org" <general@incubator.apache.org>
>> >> > Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
>> >> > Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine
>> >>Translation
>> >> > Toolkit
>> >> >
>> >&

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-02-03 Thread Tommaso Teofili
2016-02-01 16:20 GMT+01:00 Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov>:

> Hey Jim,
>
> This is a valid concern, one that I hope is mediated by taking
> however long it takes in Incubation to attract some new committers
> to work on the project. Hopefully too you saw how long I took to
> allow the discussion to occur and so forth.
>
> Lewis has actively contributed to Joshua already - you can see -
> via the HomeBrew package he created, see:
>
> https://github.com/Homebrew/homebrew/pull/45746
>
>
> You can see too it wasn’t something just recent or something
> super quick it’s something he had to work at.
>
> As for me, my involvement is going to be limited, but I am
> actively pursuing Tika’s integration with Joshua as part of
> TIKA-1343: http://issues.apache.org/jira/browse/TIKA-1343.
>
> Finally my suspicion is that Tom, Henry and Tommaso will
> contribute a lot as well.
>

FWIW although I'm new to Joshua I am very interested and plan to contribute
(maybe we integrations here and there, hint hint) as much as I can.

Regards,
Tommaso


>
> Thanks for listening.
>
> Cheers,
> Chris
>
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
>
> -Original Message-
> From: Jim Jagielski <j...@jagunet.com>
> Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>
> Date: Monday, February 1, 2016 at 4:20 AM
> To: "general@incubator.apache.org" <general@incubator.apache.org>
> Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling
>
> >I know this is specifically called-out in the proposal, but it
> >does seem worthy of further discussion.
> >
> >This has a pretty small list of initial committers, esp when one considers
> >how over-booked 2 of them appear to be.
> >
> >So, realistically, how active do both Chris and Lewis expect
> >to be?
> >
> >> On Jan 30, 2016, at 3:00 PM, Mattmann, Chris A (3980)
> >><chris.a.mattm...@jpl.nasa.gov> wrote:
> >>
> >> Hi Everyone,
> >>
> >> OK the discussion is now completed. Please VOTE to accept Joshua
> >> into the Apache Incubator. I’ll leave the VOTE open for at least
> >> the next 72 hours, with hopes to close it next Friday the 5th of
> >> February, 2016.
> >>
> >> [ ] +1 Accept Joshua as an Apache Incubator podling.
> >> [ ] +0 Abstain.
> >> [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
> >>
> >> Of course, I am +1 on this. Please note VOTEs from Incubator PMC
> >> members are binding but all are welcome to VOTE!
> >>
> >> Cheers,
> >> Chris
> >>
> >> ++
> >> Chris Mattmann, Ph.D.
> >> Chief Architect
> >> Instrument Software and Science Data Systems Section (398)
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 168-519, Mailstop: 168-527
> >> Email: chris.a.mattm...@nasa.gov
> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> ++
> >> Adjunct Associate Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> ++
> >>
> >>
> >>
> >>
> >>
> >> -Original Message-
> >> From: jpluser <chris.a.mattm...@jpl.nasa.gov>
> >> Date: Tuesday, January 12, 2016 at 10:56 PM
> >> To: "general@incubator.apache.org" <general@incubator.apache.org>
> >> Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
> >> Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine
> >>Translation
> >> Toolkit
> >>
> >>> Hi Everyone,
> >>>
> >>> Please find attached for your viewing pleasure a proposed new project,
> >>> Apa

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-02-03 Thread Jim Jagielski
+1 (binding)

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-02-03 Thread Danese Cooper
+1 (binding) Accept Joshua as an Apache Incubator podling.

D

> On Jan 30, 2016, at 12:00 PM, Mattmann, Chris A (3980) 
>  wrote:
> 
> Hi Everyone,
> 
> OK the discussion is now completed. Please VOTE to accept Joshua
> into the Apache Incubator. I’ll leave the VOTE open for at least
> the next 72 hours, with hopes to close it next Friday the 5th of
> February, 2016.
> 
> [ ] +1 Accept Joshua as an Apache Incubator podling.
> [ ] +0 Abstain.
> [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
> 
> Of course, I am +1 on this. Please note VOTEs from Incubator PMC
> members are binding but all are welcome to VOTE!
> 
> Cheers,
> Chris
> 
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
> 
> 
> 
> 
> 
> -Original Message-
> From: jpluser 
> Date: Tuesday, January 12, 2016 at 10:56 PM
> To: "general@incubator.apache.org" 
> Cc: "p...@cs.jhu.edu" 
> Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine Translation
> Toolkit
> 
>> Hi Everyone,
>> 
>> Please find attached for your viewing pleasure a proposed new project,
>> Apache Joshua, a statistical machine translation toolkit. The proposal
>> is in wiki draft form at: https://wiki.apache.org/incubator/JoshuaProposal
>> 
>> Proposal text is copied below. I’ll leave the discussion open for a week
>> and we are interested in folks who would like to be initial committers
>> and mentors. Please discuss here on the thread.
>> 
>> Thanks!
>> 
>> Cheers,
>> Chris (Champion)
>> 
>> ———
>> 
>> = Joshua Proposal =
>> 
>> == Abstract ==
>> [[joshua-decoder.org|Joshua]] is an open-source statistical machine
>> translation toolkit. It includes a Java-based decoder for translating with
>> phrase-based, hierarchical, and syntax-based translation models, a
>> Hadoop-based grammar extractor (Thrax), and an extensive set of tools and
>> scripts for training and evaluating new models from parallel text.
>> 
>> == Proposal ==
>> Joshua is a state of the art statistical machine translation system that
>> provides a number of features:
>> 
>> * Support for the two main paradigms in statistical machine translation:
>> phrase-based and hierarchical / syntactic.
>> * A sparse feature API that makes it easy to add new feature templates
>> supporting millions of features
>> * Native implementations of many tuners (MERT, MIRA, PRO, and AdaGrad)
>> * Support for lattice decoding, allowing upstream NLP tools to expose
>> their hypothesis space to the MT system
>> * An efficient representation for models, allowing for quick loading of
>> multi-gigabyte model files
>> * Fast decoding speed (on par with Moses and mtplz)
>> * Language packs — precompiled models that allow the decoder to be run as
>> a black box
>> * Thrax, a Hadoop-based tool for learning translation models from
>> parallel text
>> * A suite of tools for constructing new models for any language pair for
>> which sufficient training data exists
>> 
>> == Background and Rationale ==
>> A number of factors make this a good time for an Apache project focused on
>> machine translation (MT): the quality of MT output (for many language
>> pairs); the average computing resources available on computers, relative
>> to the needs of MT systems; and the availability of a number of
>> high-quality toolkits, together with a large base of researchers working
>> on them.
>> 
>> Over the past decade, machine translation (MT; the automatic translation
>> of one human language to another) has become a reality. The research into
>> statistical approaches to translation that began in the early nineties,
>> together with the availability of large amounts of training data, and
>> better computing infrastructure, have all come together to produce
>> translations results that are “good enough” for a large set of language
>> pairs and use cases. Free services like
>> [[https://www.bing.com/translator|Bing Translator]] and
>> [[https://translate.google.com|Google Translate]] have made these services
>> available to the average person through direct interfaces and through
>> tools like browser plugins, and sites across the world with higher
>> translation needs use them to translate their pages through automatically.
>> 
>> MT does not require the infrastructure of large corporations in order to
>> produce feasible output. Machine translation can be 

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-02-02 Thread Henri Yandell
I'm more likely to guide contributions from my employer. There's been some
contributions thus far, and there is interest to put more dayjob time into
contributing, but currently there's no coder who personally is committed to
the project.

Hen

On Mon, Feb 1, 2016 at 7:20 AM, Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Hey Jim,
>
> This is a valid concern, one that I hope is mediated by taking
> however long it takes in Incubation to attract some new committers
> to work on the project. Hopefully too you saw how long I took to
> allow the discussion to occur and so forth.
>
> Lewis has actively contributed to Joshua already - you can see -
> via the HomeBrew package he created, see:
>
> https://github.com/Homebrew/homebrew/pull/45746
>
>
> You can see too it wasn’t something just recent or something
> super quick it’s something he had to work at.
>
> As for me, my involvement is going to be limited, but I am
> actively pursuing Tika’s integration with Joshua as part of
> TIKA-1343: http://issues.apache.org/jira/browse/TIKA-1343.
>
> Finally my suspicion is that Tom, Henry and Tommaso will
> contribute a lot as well.
>
> Thanks for listening.
>
> Cheers,
> Chris
>
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
>
> -Original Message-
> From: Jim Jagielski <j...@jagunet.com>
> Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>
> Date: Monday, February 1, 2016 at 4:20 AM
> To: "general@incubator.apache.org" <general@incubator.apache.org>
> Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling
>
> >I know this is specifically called-out in the proposal, but it
> >does seem worthy of further discussion.
> >
> >This has a pretty small list of initial committers, esp when one considers
> >how over-booked 2 of them appear to be.
> >
> >So, realistically, how active do both Chris and Lewis expect
> >to be?
> >
> >> On Jan 30, 2016, at 3:00 PM, Mattmann, Chris A (3980)
> >><chris.a.mattm...@jpl.nasa.gov> wrote:
> >>
> >> Hi Everyone,
> >>
> >> OK the discussion is now completed. Please VOTE to accept Joshua
> >> into the Apache Incubator. I’ll leave the VOTE open for at least
> >> the next 72 hours, with hopes to close it next Friday the 5th of
> >> February, 2016.
> >>
> >> [ ] +1 Accept Joshua as an Apache Incubator podling.
> >> [ ] +0 Abstain.
> >> [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
> >>
> >> Of course, I am +1 on this. Please note VOTEs from Incubator PMC
> >> members are binding but all are welcome to VOTE!
> >>
> >> Cheers,
> >> Chris
> >>
> >> ++
> >> Chris Mattmann, Ph.D.
> >> Chief Architect
> >> Instrument Software and Science Data Systems Section (398)
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 168-519, Mailstop: 168-527
> >> Email: chris.a.mattm...@nasa.gov
> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> ++
> >> Adjunct Associate Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> ++
> >>
> >>
> >>
> >>
> >>
> >> -Original Message-
> >> From: jpluser <chris.a.mattm...@jpl.nasa.gov>
> >> Date: Tuesday, January 12, 2016 at 10:56 PM
> >> To: "general@incubator.apache.org" <general@incubator.apache.org>
> >> Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
> >> Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine
> >>Translation
> >> Toolkit
> >>
> >>> Hi Everyone,
> >>>
> >>> Plea

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-02-02 Thread Seetharam Venkatesh
+1 (binding).

Thanks!

On Tue, Feb 2, 2016 at 2:06 PM Henri Yandell <bay...@apache.org> wrote:

> I'm more likely to guide contributions from my employer. There's been some
> contributions thus far, and there is interest to put more dayjob time into
> contributing, but currently there's no coder who personally is committed to
> the project.
>
> Hen
>
> On Mon, Feb 1, 2016 at 7:20 AM, Mattmann, Chris A (3980) <
> chris.a.mattm...@jpl.nasa.gov> wrote:
>
> > Hey Jim,
> >
> > This is a valid concern, one that I hope is mediated by taking
> > however long it takes in Incubation to attract some new committers
> > to work on the project. Hopefully too you saw how long I took to
> > allow the discussion to occur and so forth.
> >
> > Lewis has actively contributed to Joshua already - you can see -
> > via the HomeBrew package he created, see:
> >
> > https://github.com/Homebrew/homebrew/pull/45746
> >
> >
> > You can see too it wasn’t something just recent or something
> > super quick it’s something he had to work at.
> >
> > As for me, my involvement is going to be limited, but I am
> > actively pursuing Tika’s integration with Joshua as part of
> > TIKA-1343: http://issues.apache.org/jira/browse/TIKA-1343.
> >
> > Finally my suspicion is that Tom, Henry and Tommaso will
> > contribute a lot as well.
> >
> > Thanks for listening.
> >
> > Cheers,
> > Chris
> >
> > ++
> > Chris Mattmann, Ph.D.
> > Chief Architect
> > Instrument Software and Science Data Systems Section (398)
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 168-519, Mailstop: 168-527
> > Email: chris.a.mattm...@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > ++
> > Adjunct Associate Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++
> >
> >
> >
> >
> >
> > -----Original Message-
> > From: Jim Jagielski <j...@jagunet.com>
> > Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>
> > Date: Monday, February 1, 2016 at 4:20 AM
> > To: "general@incubator.apache.org" <general@incubator.apache.org>
> > Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
> > Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling
> >
> > >I know this is specifically called-out in the proposal, but it
> > >does seem worthy of further discussion.
> > >
> > >This has a pretty small list of initial committers, esp when one
> considers
> > >how over-booked 2 of them appear to be.
> > >
> > >So, realistically, how active do both Chris and Lewis expect
> > >to be?
> > >
> > >> On Jan 30, 2016, at 3:00 PM, Mattmann, Chris A (3980)
> > >><chris.a.mattm...@jpl.nasa.gov> wrote:
> > >>
> > >> Hi Everyone,
> > >>
> > >> OK the discussion is now completed. Please VOTE to accept Joshua
> > >> into the Apache Incubator. I’ll leave the VOTE open for at least
> > >> the next 72 hours, with hopes to close it next Friday the 5th of
> > >> February, 2016.
> > >>
> > >> [ ] +1 Accept Joshua as an Apache Incubator podling.
> > >> [ ] +0 Abstain.
> > >> [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
> > >>
> > >> Of course, I am +1 on this. Please note VOTEs from Incubator PMC
> > >> members are binding but all are welcome to VOTE!
> > >>
> > >> Cheers,
> > >> Chris
> > >>
> > >> ++
> > >> Chris Mattmann, Ph.D.
> > >> Chief Architect
> > >> Instrument Software and Science Data Systems Section (398)
> > >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > >> Office: 168-519, Mailstop: 168-527
> > >> Email: chris.a.mattm...@nasa.gov
> > >> WWW:  http://sunset.usc.edu/~mattmann/
> > >> ++
> > >> Adjunct Associate Professor, Computer Science Department
> > >> University of Southern California, Los Angeles, CA 90089 USA
> > >> ++

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-02-02 Thread Lewis John Mcgibbney
Hi Chris,

[X] +1 Accept Joshua as an Apache Incubator podling.

@JimJag,
Yep I agree very valid concern.
Since becoming involved in Joshua I've addressed some 15 or so
issues/source code commits... which actually makes me the 6th most active
person on the project based on the amount of time it's been on Github :)
On a serious note, I can see my enthusiasm and contributions in the space
certainly lasting throughout incubation with the primary role of
evangelizing the sh*t out of the project.
We know a list of people who require SMT right now, by bringing Joshua into
the Incubator, it will enable us to address a roadmap of baking it in to
Tika as indicated by Chris.
The brew Formula I put together makes installation of Joshua as easy a
opening 12 cans of lager and painting the town red so I hope that we will
further grow the user base throughout incubation.

Thanks Chris and Matt for working to put together the proposal.
Lewis

On Sat, Jan 30, 2016 at 1:29 PM, 
wrote:

>
> Hi Everyone,
>
> OK the discussion is now completed. Please VOTE to accept Joshua
> into the Apache Incubator. I’ll leave the VOTE open for at least
> the next 72 hours, with hopes to close it next Friday the 5th of
> February, 2016.
>
> [ ] +1 Accept Joshua as an Apache Incubator podling.
> [ ] +0 Abstain.
> [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
>
> Of course, I am +1 on this. Please note VOTEs from Incubator PMC
> members are binding but all are welcome to VOTE!
>
> Cheers,
> Chris
>
>


Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-02-01 Thread Chris Douglas
+1 (binding) -C

On Sat, Jan 30, 2016 at 12:00 PM, Mattmann, Chris A (3980)
 wrote:
> Hi Everyone,
>
> OK the discussion is now completed. Please VOTE to accept Joshua
> into the Apache Incubator. I’ll leave the VOTE open for at least
> the next 72 hours, with hopes to close it next Friday the 5th of
> February, 2016.
>
> [ ] +1 Accept Joshua as an Apache Incubator podling.
> [ ] +0 Abstain.
> [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
>
> Of course, I am +1 on this. Please note VOTEs from Incubator PMC
> members are binding but all are welcome to VOTE!
>
> Cheers,
> Chris
>
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
>
> -Original Message-
> From: jpluser 
> Date: Tuesday, January 12, 2016 at 10:56 PM
> To: "general@incubator.apache.org" 
> Cc: "p...@cs.jhu.edu" 
> Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine Translation
> Toolkit
>
>>Hi Everyone,
>>
>>Please find attached for your viewing pleasure a proposed new project,
>>Apache Joshua, a statistical machine translation toolkit. The proposal
>>is in wiki draft form at: https://wiki.apache.org/incubator/JoshuaProposal
>>
>>Proposal text is copied below. I’ll leave the discussion open for a week
>>and we are interested in folks who would like to be initial committers
>>and mentors. Please discuss here on the thread.
>>
>>Thanks!
>>
>>Cheers,
>>Chris (Champion)
>>
>>———
>>
>>= Joshua Proposal =
>>
>>== Abstract ==
>>[[joshua-decoder.org|Joshua]] is an open-source statistical machine
>>translation toolkit. It includes a Java-based decoder for translating with
>>phrase-based, hierarchical, and syntax-based translation models, a
>>Hadoop-based grammar extractor (Thrax), and an extensive set of tools and
>>scripts for training and evaluating new models from parallel text.
>>
>>== Proposal ==
>>Joshua is a state of the art statistical machine translation system that
>>provides a number of features:
>>
>> * Support for the two main paradigms in statistical machine translation:
>>phrase-based and hierarchical / syntactic.
>> * A sparse feature API that makes it easy to add new feature templates
>>supporting millions of features
>> * Native implementations of many tuners (MERT, MIRA, PRO, and AdaGrad)
>> * Support for lattice decoding, allowing upstream NLP tools to expose
>>their hypothesis space to the MT system
>> * An efficient representation for models, allowing for quick loading of
>>multi-gigabyte model files
>> * Fast decoding speed (on par with Moses and mtplz)
>> * Language packs — precompiled models that allow the decoder to be run as
>>a black box
>> * Thrax, a Hadoop-based tool for learning translation models from
>>parallel text
>> * A suite of tools for constructing new models for any language pair for
>>which sufficient training data exists
>>
>>== Background and Rationale ==
>>A number of factors make this a good time for an Apache project focused on
>>machine translation (MT): the quality of MT output (for many language
>>pairs); the average computing resources available on computers, relative
>>to the needs of MT systems; and the availability of a number of
>>high-quality toolkits, together with a large base of researchers working
>>on them.
>>
>>Over the past decade, machine translation (MT; the automatic translation
>>of one human language to another) has become a reality. The research into
>>statistical approaches to translation that began in the early nineties,
>>together with the availability of large amounts of training data, and
>>better computing infrastructure, have all come together to produce
>>translations results that are “good enough” for a large set of language
>>pairs and use cases. Free services like
>>[[https://www.bing.com/translator|Bing Translator]] and
>>[[https://translate.google.com|Google Translate]] have made these services
>>available to the average person through direct interfaces and through
>>tools like browser plugins, and sites across the world with higher
>>translation needs use them to translate their pages through automatically.
>>
>>MT does not require the infrastructure of large corporations in order to
>>produce feasible output. Machine translation can be resource-intensive,
>>but need not be prohibitively so. Disk and memory usage are mostly a
>>matter of model size, which for most language pairs 

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-02-01 Thread Jim Jagielski
OK, cool... Just thought the topic warranted some level of
discussion ;)

> On Feb 1, 2016, at 10:31 AM, Tom Barber <tom.bar...@meteorite.bi> wrote:
> 
> Hello! I'm a code-aholic, you'll be getting regular commits from me.
> 
> Regards,
> 
> Tom
> 
> On Mon, Feb 1, 2016 at 3:20 PM, Mattmann, Chris A (3980) <
> chris.a.mattm...@jpl.nasa.gov> wrote:
> 
>> Hey Jim,
>> 
>> This is a valid concern, one that I hope is mediated by taking
>> however long it takes in Incubation to attract some new committers
>> to work on the project. Hopefully too you saw how long I took to
>> allow the discussion to occur and so forth.
>> 
>> Lewis has actively contributed to Joshua already - you can see -
>> via the HomeBrew package he created, see:
>> 
>> https://github.com/Homebrew/homebrew/pull/45746
>> 
>> 
>> You can see too it wasn’t something just recent or something
>> super quick it’s something he had to work at.
>> 
>> As for me, my involvement is going to be limited, but I am
>> actively pursuing Tika’s integration with Joshua as part of
>> TIKA-1343: http://issues.apache.org/jira/browse/TIKA-1343.
>> 
>> Finally my suspicion is that Tom, Henry and Tommaso will
>> contribute a lot as well.
>> 
>> Thanks for listening.
>> 
>> Cheers,
>> Chris
>> 
>> ++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattm...@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++
>> 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Jim Jagielski <j...@jagunet.com>
>> Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>
>> Date: Monday, February 1, 2016 at 4:20 AM
>> To: "general@incubator.apache.org" <general@incubator.apache.org>
>> Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
>> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling
>> 
>>> I know this is specifically called-out in the proposal, but it
>>> does seem worthy of further discussion.
>>> 
>>> This has a pretty small list of initial committers, esp when one considers
>>> how over-booked 2 of them appear to be.
>>> 
>>> So, realistically, how active do both Chris and Lewis expect
>>> to be?
>>> 
>>>> On Jan 30, 2016, at 3:00 PM, Mattmann, Chris A (3980)
>>>> <chris.a.mattm...@jpl.nasa.gov> wrote:
>>>> 
>>>> Hi Everyone,
>>>> 
>>>> OK the discussion is now completed. Please VOTE to accept Joshua
>>>> into the Apache Incubator. I’ll leave the VOTE open for at least
>>>> the next 72 hours, with hopes to close it next Friday the 5th of
>>>> February, 2016.
>>>> 
>>>> [ ] +1 Accept Joshua as an Apache Incubator podling.
>>>> [ ] +0 Abstain.
>>>> [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
>>>> 
>>>> Of course, I am +1 on this. Please note VOTEs from Incubator PMC
>>>> members are binding but all are welcome to VOTE!
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> ++
>>>> Chris Mattmann, Ph.D.
>>>> Chief Architect
>>>> Instrument Software and Science Data Systems Section (398)
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 168-519, Mailstop: 168-527
>>>> Email: chris.a.mattm...@nasa.gov
>>>> WWW:  http://sunset.usc.edu/~mattmann/
>>>> ++
>>>> Adjunct Associate Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -Original Message-
>>>> From: jpluser <chris.a.mattm...@jpl.n

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-02-01 Thread Jim Jagielski
I know this is specifically called-out in the proposal, but it
does seem worthy of further discussion.

This has a pretty small list of initial committers, esp when one considers
how over-booked 2 of them appear to be.

So, realistically, how active do both Chris and Lewis expect
to be?

> On Jan 30, 2016, at 3:00 PM, Mattmann, Chris A (3980) 
>  wrote:
> 
> Hi Everyone,
> 
> OK the discussion is now completed. Please VOTE to accept Joshua
> into the Apache Incubator. I’ll leave the VOTE open for at least
> the next 72 hours, with hopes to close it next Friday the 5th of
> February, 2016.
> 
> [ ] +1 Accept Joshua as an Apache Incubator podling.
> [ ] +0 Abstain.
> [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
> 
> Of course, I am +1 on this. Please note VOTEs from Incubator PMC
> members are binding but all are welcome to VOTE!
> 
> Cheers,
> Chris
> 
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
> 
> 
> 
> 
> 
> -Original Message-
> From: jpluser 
> Date: Tuesday, January 12, 2016 at 10:56 PM
> To: "general@incubator.apache.org" 
> Cc: "p...@cs.jhu.edu" 
> Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine Translation
> Toolkit
> 
>> Hi Everyone,
>> 
>> Please find attached for your viewing pleasure a proposed new project,
>> Apache Joshua, a statistical machine translation toolkit. The proposal
>> is in wiki draft form at: https://wiki.apache.org/incubator/JoshuaProposal
>> 
>> Proposal text is copied below. I’ll leave the discussion open for a week
>> and we are interested in folks who would like to be initial committers
>> and mentors. Please discuss here on the thread.
>> 
>> Thanks!
>> 
>> Cheers,
>> Chris (Champion)
>> 
>> ———
>> 
>> = Joshua Proposal =
>> 
>> == Abstract ==
>> [[joshua-decoder.org|Joshua]] is an open-source statistical machine
>> translation toolkit. It includes a Java-based decoder for translating with
>> phrase-based, hierarchical, and syntax-based translation models, a
>> Hadoop-based grammar extractor (Thrax), and an extensive set of tools and
>> scripts for training and evaluating new models from parallel text.
>> 
>> == Proposal ==
>> Joshua is a state of the art statistical machine translation system that
>> provides a number of features:
>> 
>> * Support for the two main paradigms in statistical machine translation:
>> phrase-based and hierarchical / syntactic.
>> * A sparse feature API that makes it easy to add new feature templates
>> supporting millions of features
>> * Native implementations of many tuners (MERT, MIRA, PRO, and AdaGrad)
>> * Support for lattice decoding, allowing upstream NLP tools to expose
>> their hypothesis space to the MT system
>> * An efficient representation for models, allowing for quick loading of
>> multi-gigabyte model files
>> * Fast decoding speed (on par with Moses and mtplz)
>> * Language packs — precompiled models that allow the decoder to be run as
>> a black box
>> * Thrax, a Hadoop-based tool for learning translation models from
>> parallel text
>> * A suite of tools for constructing new models for any language pair for
>> which sufficient training data exists
>> 
>> == Background and Rationale ==
>> A number of factors make this a good time for an Apache project focused on
>> machine translation (MT): the quality of MT output (for many language
>> pairs); the average computing resources available on computers, relative
>> to the needs of MT systems; and the availability of a number of
>> high-quality toolkits, together with a large base of researchers working
>> on them.
>> 
>> Over the past decade, machine translation (MT; the automatic translation
>> of one human language to another) has become a reality. The research into
>> statistical approaches to translation that began in the early nineties,
>> together with the availability of large amounts of training data, and
>> better computing infrastructure, have all come together to produce
>> translations results that are “good enough” for a large set of language
>> pairs and use cases. Free services like
>> [[https://www.bing.com/translator|Bing Translator]] and
>> [[https://translate.google.com|Google Translate]] have made these services
>> available to the average person through direct interfaces and through
>> tools like browser plugins, and sites across the world with higher
>> 

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-02-01 Thread Tom Barber
Hello! I'm a code-aholic, you'll be getting regular commits from me.

Regards,

Tom

On Mon, Feb 1, 2016 at 3:20 PM, Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Hey Jim,
>
> This is a valid concern, one that I hope is mediated by taking
> however long it takes in Incubation to attract some new committers
> to work on the project. Hopefully too you saw how long I took to
> allow the discussion to occur and so forth.
>
> Lewis has actively contributed to Joshua already - you can see -
> via the HomeBrew package he created, see:
>
> https://github.com/Homebrew/homebrew/pull/45746
>
>
> You can see too it wasn’t something just recent or something
> super quick it’s something he had to work at.
>
> As for me, my involvement is going to be limited, but I am
> actively pursuing Tika’s integration with Joshua as part of
> TIKA-1343: http://issues.apache.org/jira/browse/TIKA-1343.
>
> Finally my suspicion is that Tom, Henry and Tommaso will
> contribute a lot as well.
>
> Thanks for listening.
>
> Cheers,
> Chris
>
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
>
> -Original Message-
> From: Jim Jagielski <j...@jagunet.com>
> Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>
> Date: Monday, February 1, 2016 at 4:20 AM
> To: "general@incubator.apache.org" <general@incubator.apache.org>
> Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling
>
> >I know this is specifically called-out in the proposal, but it
> >does seem worthy of further discussion.
> >
> >This has a pretty small list of initial committers, esp when one considers
> >how over-booked 2 of them appear to be.
> >
> >So, realistically, how active do both Chris and Lewis expect
> >to be?
> >
> >> On Jan 30, 2016, at 3:00 PM, Mattmann, Chris A (3980)
> >><chris.a.mattm...@jpl.nasa.gov> wrote:
> >>
> >> Hi Everyone,
> >>
> >> OK the discussion is now completed. Please VOTE to accept Joshua
> >> into the Apache Incubator. I’ll leave the VOTE open for at least
> >> the next 72 hours, with hopes to close it next Friday the 5th of
> >> February, 2016.
> >>
> >> [ ] +1 Accept Joshua as an Apache Incubator podling.
> >> [ ] +0 Abstain.
> >> [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
> >>
> >> Of course, I am +1 on this. Please note VOTEs from Incubator PMC
> >> members are binding but all are welcome to VOTE!
> >>
> >> Cheers,
> >> Chris
> >>
> >> ++
> >> Chris Mattmann, Ph.D.
> >> Chief Architect
> >> Instrument Software and Science Data Systems Section (398)
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 168-519, Mailstop: 168-527
> >> Email: chris.a.mattm...@nasa.gov
> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> ++
> >> Adjunct Associate Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> ++
> >>
> >>
> >>
> >>
> >>
> >> -Original Message-
> >> From: jpluser <chris.a.mattm...@jpl.nasa.gov>
> >> Date: Tuesday, January 12, 2016 at 10:56 PM
> >> To: "general@incubator.apache.org" <general@incubator.apache.org>
> >> Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
> >> Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine
> >>Translation
> >> Toolkit
> >>
> >>> Hi Everyone,
> >>>
> >>> Please find attached for your viewing pleasure a proposed new project,
> >>> Apache Joshua, a statistical machine translation toolkit. The propos

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-02-01 Thread Mattmann, Chris A (3980)
Hey Jim,

This is a valid concern, one that I hope is mediated by taking
however long it takes in Incubation to attract some new committers
to work on the project. Hopefully too you saw how long I took to
allow the discussion to occur and so forth.

Lewis has actively contributed to Joshua already - you can see -
via the HomeBrew package he created, see:

https://github.com/Homebrew/homebrew/pull/45746


You can see too it wasn’t something just recent or something
super quick it’s something he had to work at.

As for me, my involvement is going to be limited, but I am
actively pursuing Tika’s integration with Joshua as part of
TIKA-1343: http://issues.apache.org/jira/browse/TIKA-1343.

Finally my suspicion is that Tom, Henry and Tommaso will
contribute a lot as well.

Thanks for listening.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++





-Original Message-
From: Jim Jagielski <j...@jagunet.com>
Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>
Date: Monday, February 1, 2016 at 4:20 AM
To: "general@incubator.apache.org" <general@incubator.apache.org>
Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling

>I know this is specifically called-out in the proposal, but it
>does seem worthy of further discussion.
>
>This has a pretty small list of initial committers, esp when one considers
>how over-booked 2 of them appear to be.
>
>So, realistically, how active do both Chris and Lewis expect
>to be?
>
>> On Jan 30, 2016, at 3:00 PM, Mattmann, Chris A (3980)
>><chris.a.mattm...@jpl.nasa.gov> wrote:
>> 
>> Hi Everyone,
>> 
>> OK the discussion is now completed. Please VOTE to accept Joshua
>> into the Apache Incubator. I’ll leave the VOTE open for at least
>> the next 72 hours, with hopes to close it next Friday the 5th of
>> February, 2016.
>> 
>> [ ] +1 Accept Joshua as an Apache Incubator podling.
>> [ ] +0 Abstain.
>> [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
>> 
>> Of course, I am +1 on this. Please note VOTEs from Incubator PMC
>> members are binding but all are welcome to VOTE!
>> 
>> Cheers,
>> Chris
>> 
>> ++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattm...@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++
>> 
>> 
>> 
>> 
>> 
>> -Original Message-
>> From: jpluser <chris.a.mattm...@jpl.nasa.gov>
>> Date: Tuesday, January 12, 2016 at 10:56 PM
>> To: "general@incubator.apache.org" <general@incubator.apache.org>
>> Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
>> Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine
>>Translation
>> Toolkit
>> 
>>> Hi Everyone,
>>> 
>>> Please find attached for your viewing pleasure a proposed new project,
>>> Apache Joshua, a statistical machine translation toolkit. The proposal
>>> is in wiki draft form at:
>>>https://wiki.apache.org/incubator/JoshuaProposal
>>> 
>>> Proposal text is copied below. I’ll leave the discussion open for a
>>>week
>>> and we are interested in folks who would like to be initial committers
>>> and mentors. Please discuss here on the thread.
>>> 
>>> Thanks!
>>> 
>>> Cheers,
>>> Chris (Champion)
>>> 
>>> ———
>>> 
>>> = Joshua Proposal =
>>> 
>>> == Abstract ==
>>> [[joshua-decoder.org|Joshua]] is an open-source statistical machine
>>> translation toolkit. It includes a Java-based decoder

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-01-30 Thread Henry Saputra
+1 (binding)

On Saturday, January 30, 2016, Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Hi Everyone,
>
> OK the discussion is now completed. Please VOTE to accept Joshua
> into the Apache Incubator. I’ll leave the VOTE open for at least
> the next 72 hours, with hopes to close it next Friday the 5th of
> February, 2016.
>
> [ ] +1 Accept Joshua as an Apache Incubator podling.
> [ ] +0 Abstain.
> [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
>
> Of course, I am +1 on this. Please note VOTEs from Incubator PMC
> members are binding but all are welcome to VOTE!
>
> Cheers,
> Chris
>
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov 
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
>
> -Original Message-
> From: jpluser >
> Date: Tuesday, January 12, 2016 at 10:56 PM
> To: "general@incubator.apache.org " <
> general@incubator.apache.org >
> Cc: "p...@cs.jhu.edu " >
> Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine Translation
> Toolkit
>
> >Hi Everyone,
> >
> >Please find attached for your viewing pleasure a proposed new project,
> >Apache Joshua, a statistical machine translation toolkit. The proposal
> >is in wiki draft form at:
> https://wiki.apache.org/incubator/JoshuaProposal
> >
> >Proposal text is copied below. I’ll leave the discussion open for a week
> >and we are interested in folks who would like to be initial committers
> >and mentors. Please discuss here on the thread.
> >
> >Thanks!
> >
> >Cheers,
> >Chris (Champion)
> >
> >———
> >
> >= Joshua Proposal =
> >
> >== Abstract ==
> >[[joshua-decoder.org|Joshua]] is an open-source statistical machine
> >translation toolkit. It includes a Java-based decoder for translating with
> >phrase-based, hierarchical, and syntax-based translation models, a
> >Hadoop-based grammar extractor (Thrax), and an extensive set of tools and
> >scripts for training and evaluating new models from parallel text.
> >
> >== Proposal ==
> >Joshua is a state of the art statistical machine translation system that
> >provides a number of features:
> >
> > * Support for the two main paradigms in statistical machine translation:
> >phrase-based and hierarchical / syntactic.
> > * A sparse feature API that makes it easy to add new feature templates
> >supporting millions of features
> > * Native implementations of many tuners (MERT, MIRA, PRO, and AdaGrad)
> > * Support for lattice decoding, allowing upstream NLP tools to expose
> >their hypothesis space to the MT system
> > * An efficient representation for models, allowing for quick loading of
> >multi-gigabyte model files
> > * Fast decoding speed (on par with Moses and mtplz)
> > * Language packs — precompiled models that allow the decoder to be run as
> >a black box
> > * Thrax, a Hadoop-based tool for learning translation models from
> >parallel text
> > * A suite of tools for constructing new models for any language pair for
> >which sufficient training data exists
> >
> >== Background and Rationale ==
> >A number of factors make this a good time for an Apache project focused on
> >machine translation (MT): the quality of MT output (for many language
> >pairs); the average computing resources available on computers, relative
> >to the needs of MT systems; and the availability of a number of
> >high-quality toolkits, together with a large base of researchers working
> >on them.
> >
> >Over the past decade, machine translation (MT; the automatic translation
> >of one human language to another) has become a reality. The research into
> >statistical approaches to translation that began in the early nineties,
> >together with the availability of large amounts of training data, and
> >better computing infrastructure, have all come together to produce
> >translations results that are “good enough” for a large set of language
> >pairs and use cases. Free services like
> >[[https://www.bing.com/translator|Bing Translator]] and
> >[[https://translate.google.com|Google Translate]] have made these
> services
> >available to the average person through direct interfaces and through
> >tools like browser plugins, and sites across the world with higher
> >translation needs use them to translate their pages through automatically.
> >
> >MT does not require the infrastructure of large corporations in order to
> >produce feasible output. Machine 

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-01-30 Thread Ashish
+ (non-binding)

On Sat, Jan 30, 2016 at 12:00 PM, Mattmann, Chris A (3980)
 wrote:
> Hi Everyone,
>
> OK the discussion is now completed. Please VOTE to accept Joshua
> into the Apache Incubator. I’ll leave the VOTE open for at least
> the next 72 hours, with hopes to close it next Friday the 5th of
> February, 2016.
>
> [ ] +1 Accept Joshua as an Apache Incubator podling.
> [ ] +0 Abstain.
> [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
>
> Of course, I am +1 on this. Please note VOTEs from Incubator PMC
> members are binding but all are welcome to VOTE!
>
> Cheers,
> Chris
>
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
>
> -Original Message-
> From: jpluser 
> Date: Tuesday, January 12, 2016 at 10:56 PM
> To: "general@incubator.apache.org" 
> Cc: "p...@cs.jhu.edu" 
> Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine Translation
> Toolkit
>
>>Hi Everyone,
>>
>>Please find attached for your viewing pleasure a proposed new project,
>>Apache Joshua, a statistical machine translation toolkit. The proposal
>>is in wiki draft form at: https://wiki.apache.org/incubator/JoshuaProposal
>>
>>Proposal text is copied below. I’ll leave the discussion open for a week
>>and we are interested in folks who would like to be initial committers
>>and mentors. Please discuss here on the thread.
>>
>>Thanks!
>>
>>Cheers,
>>Chris (Champion)
>>
>>———
>>
>>= Joshua Proposal =
>>
>>== Abstract ==
>>[[joshua-decoder.org|Joshua]] is an open-source statistical machine
>>translation toolkit. It includes a Java-based decoder for translating with
>>phrase-based, hierarchical, and syntax-based translation models, a
>>Hadoop-based grammar extractor (Thrax), and an extensive set of tools and
>>scripts for training and evaluating new models from parallel text.
>>
>>== Proposal ==
>>Joshua is a state of the art statistical machine translation system that
>>provides a number of features:
>>
>> * Support for the two main paradigms in statistical machine translation:
>>phrase-based and hierarchical / syntactic.
>> * A sparse feature API that makes it easy to add new feature templates
>>supporting millions of features
>> * Native implementations of many tuners (MERT, MIRA, PRO, and AdaGrad)
>> * Support for lattice decoding, allowing upstream NLP tools to expose
>>their hypothesis space to the MT system
>> * An efficient representation for models, allowing for quick loading of
>>multi-gigabyte model files
>> * Fast decoding speed (on par with Moses and mtplz)
>> * Language packs — precompiled models that allow the decoder to be run as
>>a black box
>> * Thrax, a Hadoop-based tool for learning translation models from
>>parallel text
>> * A suite of tools for constructing new models for any language pair for
>>which sufficient training data exists
>>
>>== Background and Rationale ==
>>A number of factors make this a good time for an Apache project focused on
>>machine translation (MT): the quality of MT output (for many language
>>pairs); the average computing resources available on computers, relative
>>to the needs of MT systems; and the availability of a number of
>>high-quality toolkits, together with a large base of researchers working
>>on them.
>>
>>Over the past decade, machine translation (MT; the automatic translation
>>of one human language to another) has become a reality. The research into
>>statistical approaches to translation that began in the early nineties,
>>together with the availability of large amounts of training data, and
>>better computing infrastructure, have all come together to produce
>>translations results that are “good enough” for a large set of language
>>pairs and use cases. Free services like
>>[[https://www.bing.com/translator|Bing Translator]] and
>>[[https://translate.google.com|Google Translate]] have made these services
>>available to the average person through direct interfaces and through
>>tools like browser plugins, and sites across the world with higher
>>translation needs use them to translate their pages through automatically.
>>
>>MT does not require the infrastructure of large corporations in order to
>>produce feasible output. Machine translation can be resource-intensive,
>>but need not be prohibitively so. Disk and memory usage are mostly a
>>matter of model size, which for most language pairs 

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-01-30 Thread Tom Barber
+1 binding

Should be a very interesting project!

On Sat, Jan 30, 2016 at 8:05 PM, Ashish  wrote:

> + (non-binding)
>
> On Sat, Jan 30, 2016 at 12:00 PM, Mattmann, Chris A (3980)
>  wrote:
> > Hi Everyone,
> >
> > OK the discussion is now completed. Please VOTE to accept Joshua
> > into the Apache Incubator. I’ll leave the VOTE open for at least
> > the next 72 hours, with hopes to close it next Friday the 5th of
> > February, 2016.
> >
> > [ ] +1 Accept Joshua as an Apache Incubator podling.
> > [ ] +0 Abstain.
> > [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
> >
> > Of course, I am +1 on this. Please note VOTEs from Incubator PMC
> > members are binding but all are welcome to VOTE!
> >
> > Cheers,
> > Chris
> >
> > ++
> > Chris Mattmann, Ph.D.
> > Chief Architect
> > Instrument Software and Science Data Systems Section (398)
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 168-519, Mailstop: 168-527
> > Email: chris.a.mattm...@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > ++
> > Adjunct Associate Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++
> >
> >
> >
> >
> >
> > -Original Message-
> > From: jpluser 
> > Date: Tuesday, January 12, 2016 at 10:56 PM
> > To: "general@incubator.apache.org" 
> > Cc: "p...@cs.jhu.edu" 
> > Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine Translation
> > Toolkit
> >
> >>Hi Everyone,
> >>
> >>Please find attached for your viewing pleasure a proposed new project,
> >>Apache Joshua, a statistical machine translation toolkit. The proposal
> >>is in wiki draft form at:
> https://wiki.apache.org/incubator/JoshuaProposal
> >>
> >>Proposal text is copied below. I’ll leave the discussion open for a week
> >>and we are interested in folks who would like to be initial committers
> >>and mentors. Please discuss here on the thread.
> >>
> >>Thanks!
> >>
> >>Cheers,
> >>Chris (Champion)
> >>
> >>———
> >>
> >>= Joshua Proposal =
> >>
> >>== Abstract ==
> >>[[joshua-decoder.org|Joshua]] is an open-source statistical machine
> >>translation toolkit. It includes a Java-based decoder for translating
> with
> >>phrase-based, hierarchical, and syntax-based translation models, a
> >>Hadoop-based grammar extractor (Thrax), and an extensive set of tools and
> >>scripts for training and evaluating new models from parallel text.
> >>
> >>== Proposal ==
> >>Joshua is a state of the art statistical machine translation system that
> >>provides a number of features:
> >>
> >> * Support for the two main paradigms in statistical machine translation:
> >>phrase-based and hierarchical / syntactic.
> >> * A sparse feature API that makes it easy to add new feature templates
> >>supporting millions of features
> >> * Native implementations of many tuners (MERT, MIRA, PRO, and AdaGrad)
> >> * Support for lattice decoding, allowing upstream NLP tools to expose
> >>their hypothesis space to the MT system
> >> * An efficient representation for models, allowing for quick loading of
> >>multi-gigabyte model files
> >> * Fast decoding speed (on par with Moses and mtplz)
> >> * Language packs — precompiled models that allow the decoder to be run
> as
> >>a black box
> >> * Thrax, a Hadoop-based tool for learning translation models from
> >>parallel text
> >> * A suite of tools for constructing new models for any language pair for
> >>which sufficient training data exists
> >>
> >>== Background and Rationale ==
> >>A number of factors make this a good time for an Apache project focused
> on
> >>machine translation (MT): the quality of MT output (for many language
> >>pairs); the average computing resources available on computers, relative
> >>to the needs of MT systems; and the availability of a number of
> >>high-quality toolkits, together with a large base of researchers working
> >>on them.
> >>
> >>Over the past decade, machine translation (MT; the automatic translation
> >>of one human language to another) has become a reality. The research into
> >>statistical approaches to translation that began in the early nineties,
> >>together with the availability of large amounts of training data, and
> >>better computing infrastructure, have all come together to produce
> >>translations results that are “good enough” for a large set of language
> >>pairs and use cases. Free services like
> >>[[https://www.bing.com/translator|Bing Translator]] and
> >>[[https://translate.google.com|Google Translate]] have made these
> services
> >>available to the average person through direct interfaces and through
> >>tools like browser plugins, and sites across the world with 

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-01-30 Thread Luke Han
+1 non-binding


Best Regards!
-

Luke Han

On Sun, Jan 31, 2016 at 5:27 AM, Tom Barber  wrote:

> +1 binding
>
> Should be a very interesting project!
>
> On Sat, Jan 30, 2016 at 8:05 PM, Ashish  wrote:
>
> > + (non-binding)
> >
> > On Sat, Jan 30, 2016 at 12:00 PM, Mattmann, Chris A (3980)
> >  wrote:
> > > Hi Everyone,
> > >
> > > OK the discussion is now completed. Please VOTE to accept Joshua
> > > into the Apache Incubator. I’ll leave the VOTE open for at least
> > > the next 72 hours, with hopes to close it next Friday the 5th of
> > > February, 2016.
> > >
> > > [ ] +1 Accept Joshua as an Apache Incubator podling.
> > > [ ] +0 Abstain.
> > > [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
> > >
> > > Of course, I am +1 on this. Please note VOTEs from Incubator PMC
> > > members are binding but all are welcome to VOTE!
> > >
> > > Cheers,
> > > Chris
> > >
> > > ++
> > > Chris Mattmann, Ph.D.
> > > Chief Architect
> > > Instrument Software and Science Data Systems Section (398)
> > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > > Office: 168-519, Mailstop: 168-527
> > > Email: chris.a.mattm...@nasa.gov
> > > WWW:  http://sunset.usc.edu/~mattmann/
> > > ++
> > > Adjunct Associate Professor, Computer Science Department
> > > University of Southern California, Los Angeles, CA 90089 USA
> > > ++
> > >
> > >
> > >
> > >
> > >
> > > -Original Message-
> > > From: jpluser 
> > > Date: Tuesday, January 12, 2016 at 10:56 PM
> > > To: "general@incubator.apache.org" 
> > > Cc: "p...@cs.jhu.edu" 
> > > Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine
> Translation
> > > Toolkit
> > >
> > >>Hi Everyone,
> > >>
> > >>Please find attached for your viewing pleasure a proposed new project,
> > >>Apache Joshua, a statistical machine translation toolkit. The proposal
> > >>is in wiki draft form at:
> > https://wiki.apache.org/incubator/JoshuaProposal
> > >>
> > >>Proposal text is copied below. I’ll leave the discussion open for a
> week
> > >>and we are interested in folks who would like to be initial committers
> > >>and mentors. Please discuss here on the thread.
> > >>
> > >>Thanks!
> > >>
> > >>Cheers,
> > >>Chris (Champion)
> > >>
> > >>———
> > >>
> > >>= Joshua Proposal =
> > >>
> > >>== Abstract ==
> > >>[[joshua-decoder.org|Joshua]] is an open-source statistical machine
> > >>translation toolkit. It includes a Java-based decoder for translating
> > with
> > >>phrase-based, hierarchical, and syntax-based translation models, a
> > >>Hadoop-based grammar extractor (Thrax), and an extensive set of tools
> and
> > >>scripts for training and evaluating new models from parallel text.
> > >>
> > >>== Proposal ==
> > >>Joshua is a state of the art statistical machine translation system
> that
> > >>provides a number of features:
> > >>
> > >> * Support for the two main paradigms in statistical machine
> translation:
> > >>phrase-based and hierarchical / syntactic.
> > >> * A sparse feature API that makes it easy to add new feature templates
> > >>supporting millions of features
> > >> * Native implementations of many tuners (MERT, MIRA, PRO, and AdaGrad)
> > >> * Support for lattice decoding, allowing upstream NLP tools to expose
> > >>their hypothesis space to the MT system
> > >> * An efficient representation for models, allowing for quick loading
> of
> > >>multi-gigabyte model files
> > >> * Fast decoding speed (on par with Moses and mtplz)
> > >> * Language packs — precompiled models that allow the decoder to be run
> > as
> > >>a black box
> > >> * Thrax, a Hadoop-based tool for learning translation models from
> > >>parallel text
> > >> * A suite of tools for constructing new models for any language pair
> for
> > >>which sufficient training data exists
> > >>
> > >>== Background and Rationale ==
> > >>A number of factors make this a good time for an Apache project focused
> > on
> > >>machine translation (MT): the quality of MT output (for many language
> > >>pairs); the average computing resources available on computers,
> relative
> > >>to the needs of MT systems; and the availability of a number of
> > >>high-quality toolkits, together with a large base of researchers
> working
> > >>on them.
> > >>
> > >>Over the past decade, machine translation (MT; the automatic
> translation
> > >>of one human language to another) has become a reality. The research
> into
> > >>statistical approaches to translation that began in the early nineties,
> > >>together with the availability of large amounts of training data, and
> > >>better computing infrastructure, have all come together to produce
> > 

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-01-30 Thread Henri Yandell
+1 (non-binding).

On Sat, Jan 30, 2016 at 5:45 PM, Luke Han  wrote:

> +1 non-binding
>
>
> Best Regards!
> -
>
> Luke Han
>
> On Sun, Jan 31, 2016 at 5:27 AM, Tom Barber 
> wrote:
>
> > +1 binding
> >
> > Should be a very interesting project!
> >
> > On Sat, Jan 30, 2016 at 8:05 PM, Ashish  wrote:
> >
> > > + (non-binding)
> > >
> > > On Sat, Jan 30, 2016 at 12:00 PM, Mattmann, Chris A (3980)
> > >  wrote:
> > > > Hi Everyone,
> > > >
> > > > OK the discussion is now completed. Please VOTE to accept Joshua
> > > > into the Apache Incubator. I’ll leave the VOTE open for at least
> > > > the next 72 hours, with hopes to close it next Friday the 5th of
> > > > February, 2016.
> > > >
> > > > [ ] +1 Accept Joshua as an Apache Incubator podling.
> > > > [ ] +0 Abstain.
> > > > [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
> > > >
> > > > Of course, I am +1 on this. Please note VOTEs from Incubator PMC
> > > > members are binding but all are welcome to VOTE!
> > > >
> > > > Cheers,
> > > > Chris
> > > >
> > > > ++
> > > > Chris Mattmann, Ph.D.
> > > > Chief Architect
> > > > Instrument Software and Science Data Systems Section (398)
> > > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > > > Office: 168-519, Mailstop: 168-527
> > > > Email: chris.a.mattm...@nasa.gov
> > > > WWW:  http://sunset.usc.edu/~mattmann/
> > > > ++
> > > > Adjunct Associate Professor, Computer Science Department
> > > > University of Southern California, Los Angeles, CA 90089 USA
> > > > ++
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > -Original Message-
> > > > From: jpluser 
> > > > Date: Tuesday, January 12, 2016 at 10:56 PM
> > > > To: "general@incubator.apache.org" 
> > > > Cc: "p...@cs.jhu.edu" 
> > > > Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine
> > Translation
> > > > Toolkit
> > > >
> > > >>Hi Everyone,
> > > >>
> > > >>Please find attached for your viewing pleasure a proposed new
> project,
> > > >>Apache Joshua, a statistical machine translation toolkit. The
> proposal
> > > >>is in wiki draft form at:
> > > https://wiki.apache.org/incubator/JoshuaProposal
> > > >>
> > > >>Proposal text is copied below. I’ll leave the discussion open for a
> > week
> > > >>and we are interested in folks who would like to be initial
> committers
> > > >>and mentors. Please discuss here on the thread.
> > > >>
> > > >>Thanks!
> > > >>
> > > >>Cheers,
> > > >>Chris (Champion)
> > > >>
> > > >>———
> > > >>
> > > >>= Joshua Proposal =
> > > >>
> > > >>== Abstract ==
> > > >>[[joshua-decoder.org|Joshua]] is an open-source statistical machine
> > > >>translation toolkit. It includes a Java-based decoder for translating
> > > with
> > > >>phrase-based, hierarchical, and syntax-based translation models, a
> > > >>Hadoop-based grammar extractor (Thrax), and an extensive set of tools
> > and
> > > >>scripts for training and evaluating new models from parallel text.
> > > >>
> > > >>== Proposal ==
> > > >>Joshua is a state of the art statistical machine translation system
> > that
> > > >>provides a number of features:
> > > >>
> > > >> * Support for the two main paradigms in statistical machine
> > translation:
> > > >>phrase-based and hierarchical / syntactic.
> > > >> * A sparse feature API that makes it easy to add new feature
> templates
> > > >>supporting millions of features
> > > >> * Native implementations of many tuners (MERT, MIRA, PRO, and
> AdaGrad)
> > > >> * Support for lattice decoding, allowing upstream NLP tools to
> expose
> > > >>their hypothesis space to the MT system
> > > >> * An efficient representation for models, allowing for quick loading
> > of
> > > >>multi-gigabyte model files
> > > >> * Fast decoding speed (on par with Moses and mtplz)
> > > >> * Language packs — precompiled models that allow the decoder to be
> run
> > > as
> > > >>a black box
> > > >> * Thrax, a Hadoop-based tool for learning translation models from
> > > >>parallel text
> > > >> * A suite of tools for constructing new models for any language pair
> > for
> > > >>which sufficient training data exists
> > > >>
> > > >>== Background and Rationale ==
> > > >>A number of factors make this a good time for an Apache project
> focused
> > > on
> > > >>machine translation (MT): the quality of MT output (for many language
> > > >>pairs); the average computing resources available on computers,
> > relative
> > > >>to the needs of MT systems; and the availability of a number of
> > > >>high-quality toolkits, together with a large base of researchers
> > working
> > > >>on them.
> > > >>
> > > >>Over the past decade, 

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-01-30 Thread Jean-Baptiste Onofré

+1 (binding)

Regards
JB

On 01/30/2016 09:00 PM, Mattmann, Chris A (3980) wrote:

Hi Everyone,

OK the discussion is now completed. Please VOTE to accept Joshua
into the Apache Incubator. I’ll leave the VOTE open for at least
the next 72 hours, with hopes to close it next Friday the 5th of
February, 2016.

[ ] +1 Accept Joshua as an Apache Incubator podling.
[ ] +0 Abstain.
[ ] -1 Don’t accept Joshua as an Apache Incubator podling because..

Of course, I am +1 on this. Please note VOTEs from Incubator PMC
members are binding but all are welcome to VOTE!

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++





-Original Message-
From: jpluser 
Date: Tuesday, January 12, 2016 at 10:56 PM
To: "general@incubator.apache.org" 
Cc: "p...@cs.jhu.edu" 
Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine Translation
Toolkit


Hi Everyone,

Please find attached for your viewing pleasure a proposed new project,
Apache Joshua, a statistical machine translation toolkit. The proposal
is in wiki draft form at: https://wiki.apache.org/incubator/JoshuaProposal

Proposal text is copied below. I’ll leave the discussion open for a week
and we are interested in folks who would like to be initial committers
and mentors. Please discuss here on the thread.

Thanks!

Cheers,
Chris (Champion)

———

= Joshua Proposal =

== Abstract ==
[[joshua-decoder.org|Joshua]] is an open-source statistical machine
translation toolkit. It includes a Java-based decoder for translating with
phrase-based, hierarchical, and syntax-based translation models, a
Hadoop-based grammar extractor (Thrax), and an extensive set of tools and
scripts for training and evaluating new models from parallel text.

== Proposal ==
Joshua is a state of the art statistical machine translation system that
provides a number of features:

* Support for the two main paradigms in statistical machine translation:
phrase-based and hierarchical / syntactic.
* A sparse feature API that makes it easy to add new feature templates
supporting millions of features
* Native implementations of many tuners (MERT, MIRA, PRO, and AdaGrad)
* Support for lattice decoding, allowing upstream NLP tools to expose
their hypothesis space to the MT system
* An efficient representation for models, allowing for quick loading of
multi-gigabyte model files
* Fast decoding speed (on par with Moses and mtplz)
* Language packs — precompiled models that allow the decoder to be run as
a black box
* Thrax, a Hadoop-based tool for learning translation models from
parallel text
* A suite of tools for constructing new models for any language pair for
which sufficient training data exists

== Background and Rationale ==
A number of factors make this a good time for an Apache project focused on
machine translation (MT): the quality of MT output (for many language
pairs); the average computing resources available on computers, relative
to the needs of MT systems; and the availability of a number of
high-quality toolkits, together with a large base of researchers working
on them.

Over the past decade, machine translation (MT; the automatic translation
of one human language to another) has become a reality. The research into
statistical approaches to translation that began in the early nineties,
together with the availability of large amounts of training data, and
better computing infrastructure, have all come together to produce
translations results that are “good enough” for a large set of language
pairs and use cases. Free services like
[[https://www.bing.com/translator|Bing Translator]] and
[[https://translate.google.com|Google Translate]] have made these services
available to the average person through direct interfaces and through
tools like browser plugins, and sites across the world with higher
translation needs use them to translate their pages through automatically.

MT does not require the infrastructure of large corporations in order to
produce feasible output. Machine translation can be resource-intensive,
but need not be prohibitively so. Disk and memory usage are mostly a
matter of model size, which for most language pairs is a few gigabytes at
most, at which size models can provide coverage on the order of tens or
even hundreds of thousands of words in the input and output languages. The
computational complexity of the algorithms used to search for translations
of new 

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-01-30 Thread Tommaso Teofili
+1 (binding)

Tommaso

2016-01-30 21:00 GMT+01:00 Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov>:

> Hi Everyone,
>
> OK the discussion is now completed. Please VOTE to accept Joshua
> into the Apache Incubator. I’ll leave the VOTE open for at least
> the next 72 hours, with hopes to close it next Friday the 5th of
> February, 2016.
>
> [ ] +1 Accept Joshua as an Apache Incubator podling.
> [ ] +0 Abstain.
> [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
>
> Of course, I am +1 on this. Please note VOTEs from Incubator PMC
> members are binding but all are welcome to VOTE!
>
> Cheers,
> Chris
>
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
>
> -Original Message-
> From: jpluser 
> Date: Tuesday, January 12, 2016 at 10:56 PM
> To: "general@incubator.apache.org" 
> Cc: "p...@cs.jhu.edu" 
> Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine Translation
> Toolkit
>
> >Hi Everyone,
> >
> >Please find attached for your viewing pleasure a proposed new project,
> >Apache Joshua, a statistical machine translation toolkit. The proposal
> >is in wiki draft form at:
> https://wiki.apache.org/incubator/JoshuaProposal
> >
> >Proposal text is copied below. I’ll leave the discussion open for a week
> >and we are interested in folks who would like to be initial committers
> >and mentors. Please discuss here on the thread.
> >
> >Thanks!
> >
> >Cheers,
> >Chris (Champion)
> >
> >———
> >
> >= Joshua Proposal =
> >
> >== Abstract ==
> >[[joshua-decoder.org|Joshua]] is an open-source statistical machine
> >translation toolkit. It includes a Java-based decoder for translating with
> >phrase-based, hierarchical, and syntax-based translation models, a
> >Hadoop-based grammar extractor (Thrax), and an extensive set of tools and
> >scripts for training and evaluating new models from parallel text.
> >
> >== Proposal ==
> >Joshua is a state of the art statistical machine translation system that
> >provides a number of features:
> >
> > * Support for the two main paradigms in statistical machine translation:
> >phrase-based and hierarchical / syntactic.
> > * A sparse feature API that makes it easy to add new feature templates
> >supporting millions of features
> > * Native implementations of many tuners (MERT, MIRA, PRO, and AdaGrad)
> > * Support for lattice decoding, allowing upstream NLP tools to expose
> >their hypothesis space to the MT system
> > * An efficient representation for models, allowing for quick loading of
> >multi-gigabyte model files
> > * Fast decoding speed (on par with Moses and mtplz)
> > * Language packs — precompiled models that allow the decoder to be run as
> >a black box
> > * Thrax, a Hadoop-based tool for learning translation models from
> >parallel text
> > * A suite of tools for constructing new models for any language pair for
> >which sufficient training data exists
> >
> >== Background and Rationale ==
> >A number of factors make this a good time for an Apache project focused on
> >machine translation (MT): the quality of MT output (for many language
> >pairs); the average computing resources available on computers, relative
> >to the needs of MT systems; and the availability of a number of
> >high-quality toolkits, together with a large base of researchers working
> >on them.
> >
> >Over the past decade, machine translation (MT; the automatic translation
> >of one human language to another) has become a reality. The research into
> >statistical approaches to translation that began in the early nineties,
> >together with the availability of large amounts of training data, and
> >better computing infrastructure, have all come together to produce
> >translations results that are “good enough” for a large set of language
> >pairs and use cases. Free services like
> >[[https://www.bing.com/translator|Bing Translator]] and
> >[[https://translate.google.com|Google Translate]] have made these
> services
> >available to the average person through direct interfaces and through
> >tools like browser plugins, and sites across the world with higher
> >translation needs use them to translate their pages through automatically.
> >
> >MT does not require the infrastructure of large corporations in order to
> >produce feasible output. Machine translation can be resource-intensive,
> >but need not be prohibitively so. Disk and memory