Thank you! I'll give 'em a try.

---
Jeremy B. Merrill
The New York Times


On Mon, Mar 20, 2017 at 4:01 PM, Mattmann, Chris A (3010) <
[email protected]> wrote:

> I would try em’ all out honestly. Performance-wise, setup wise they are
> kind of different, though
> Tika boils it down to a config file for each which is nice. I am working
> on a paper that compares
> all of them but am not done yet ;)
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> Chris Mattmann, Ph.D.
>
> Principal Data Scientist, Engineering Administrative Office (3010)
>
> Manager, NSF & Open Source Projects Formulation and Development Offices
> (8212)
>
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>
> Office: 180-503E, Mailstop: 180-503
>
> Email: [email protected]
>
> WWW:  http://sunset.usc.edu/~mattmann/
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> Director, Information Retrieval and Data Science Group (IRDS)
>
> Adjunct Associate Professor, Computer Science Department
>
> University of Southern California, Los Angeles, CA 90089 USA
>
> WWW: http://irds.usc.edu/
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
> *From: *"Merrill, Jeremy" <[email protected]>
> *Reply-To: *"[email protected]" <[email protected]>
> *Date: *Monday, March 20, 2017 at 11:59 AM
> *To: *"[email protected]" <[email protected]>
> *Subject: *Re: machine translation recommendation for use with Tika?
>
>
>
> Hi Chris,
>
> Thank you, this is helpful. I think running our own system is out of the
> question, just on account of time (News just keeps on happening. Though
> it'd certainly would be fun to play with...) and -- presumably -- result
> quality.
>
> Do you have thoughts on which of Google, Microsoft and Lingo24 might be
> easiest? Or are they all just as easy to use with Tika and I should just
> try 'em all out?
>
> Thanks,
>
>
> ---
>
> Jeremy B. Merrill
>
> The New York Times
>
>
>
>
>
> On Mon, Mar 20, 2017 at 1:43 PM, Mattmann, Chris A (3010) <
> [email protected]> wrote:
>
> Hi Jeremy,
>
>
>
> Thanks for reaching out.
>
>
>
> So far I have had really good experience with the Lingo24 translator. It
> really depends though
> and is based on two families of what you are trying to do. For example, if
> you want the widest,
> most broad coverage and trained translation, Google, Microsoft, Lingo24,
> fall into the remote
> translation API service category. They all have tons of data, and
> training. I also think all use
> human curators for quality review of some things. All will eventually cost
> you. I know that you
> get some X million characters of translation a month in the services.
>
>
>
> On the other end is if you deploy your own Apache Joshua (incubating)
> and/or Moses MT system,
> and then have Tika connect to them as a service. In this case you control
> the costs and can run it
> on your own servers, etc, but you are limited by the quality of your
> trained models, and your language
> pairs.
>
>
>
> Does this make sense?
>
>
>
> Cheers,
>
> Chris
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> Chris Mattmann, Ph.D.
>
> Principal Data Scientist, Engineering Administrative Office (3010)
>
> Manager, NSF & Open Source Projects Formulation and Development Offices
> (8212)
>
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>
> Office: 180-503E, Mailstop: 180-503
>
> Email: [email protected]
>
> WWW:  http://sunset.usc.edu/~mattmann/
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> Director, Information Retrieval and Data Science Group (IRDS)
>
> Adjunct Associate Professor, Computer Science Department
>
> University of Southern California, Los Angeles, CA 90089 USA
>
> WWW: http://irds.usc.edu/
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
> *From: *"Merrill, Jeremy" <[email protected]>
> *Reply-To: *"[email protected]" <[email protected]>
> *Date: *Monday, March 20, 2017 at 8:30 AM
> *To: *"[email protected]" <[email protected]>
> *Subject: *machine translation recommendation for use with Tika?
>
>
>
> Hi friends,
>
> I've been tasked with figuring out how to machine-translate a large set of
> documents from a common European language into English, using a system that
> already utilizes Tika.
>
> I know Tika integrates with a handful of machine-translation APIs
> <https://tika.apache.org/1.14/api/org/apache/tika/language/translate/package-summary.html>.
> Do you all have a sense of which works best, both in terms of translation
> quality and ease of integration with Tika?
>
> (We know we're going to have to pay, but the amount of content won't be
> huge, so differences in price aren't a big factor.)
>
> Thanks in advance,
>
> Jeremy B. Merrill
>
>
>
>
>
>
>

Reply via email to