Thank you! I'll give 'em a try. --- Jeremy B. Merrill The New York Times
On Mon, Mar 20, 2017 at 4:01 PM, Mattmann, Chris A (3010) < [email protected]> wrote: > I would try em’ all out honestly. Performance-wise, setup wise they are > kind of different, though > Tika boils it down to a config file for each which is nice. I am working > on a paper that compares > all of them but am not done yet ;) > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Chris Mattmann, Ph.D. > > Principal Data Scientist, Engineering Administrative Office (3010) > > Manager, NSF & Open Source Projects Formulation and Development Offices > (8212) > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > Office: 180-503E, Mailstop: 180-503 > > Email: [email protected] > > WWW: http://sunset.usc.edu/~mattmann/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Director, Information Retrieval and Data Science Group (IRDS) > > Adjunct Associate Professor, Computer Science Department > > University of Southern California, Los Angeles, CA 90089 USA > > WWW: http://irds.usc.edu/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > *From: *"Merrill, Jeremy" <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Monday, March 20, 2017 at 11:59 AM > *To: *"[email protected]" <[email protected]> > *Subject: *Re: machine translation recommendation for use with Tika? > > > > Hi Chris, > > Thank you, this is helpful. I think running our own system is out of the > question, just on account of time (News just keeps on happening. Though > it'd certainly would be fun to play with...) and -- presumably -- result > quality. > > Do you have thoughts on which of Google, Microsoft and Lingo24 might be > easiest? Or are they all just as easy to use with Tika and I should just > try 'em all out? > > Thanks, > > > --- > > Jeremy B. Merrill > > The New York Times > > > > > > On Mon, Mar 20, 2017 at 1:43 PM, Mattmann, Chris A (3010) < > [email protected]> wrote: > > Hi Jeremy, > > > > Thanks for reaching out. > > > > So far I have had really good experience with the Lingo24 translator. It > really depends though > and is based on two families of what you are trying to do. For example, if > you want the widest, > most broad coverage and trained translation, Google, Microsoft, Lingo24, > fall into the remote > translation API service category. They all have tons of data, and > training. I also think all use > human curators for quality review of some things. All will eventually cost > you. I know that you > get some X million characters of translation a month in the services. > > > > On the other end is if you deploy your own Apache Joshua (incubating) > and/or Moses MT system, > and then have Tika connect to them as a service. In this case you control > the costs and can run it > on your own servers, etc, but you are limited by the quality of your > trained models, and your language > pairs. > > > > Does this make sense? > > > > Cheers, > > Chris > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Chris Mattmann, Ph.D. > > Principal Data Scientist, Engineering Administrative Office (3010) > > Manager, NSF & Open Source Projects Formulation and Development Offices > (8212) > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > Office: 180-503E, Mailstop: 180-503 > > Email: [email protected] > > WWW: http://sunset.usc.edu/~mattmann/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Director, Information Retrieval and Data Science Group (IRDS) > > Adjunct Associate Professor, Computer Science Department > > University of Southern California, Los Angeles, CA 90089 USA > > WWW: http://irds.usc.edu/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > *From: *"Merrill, Jeremy" <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Monday, March 20, 2017 at 8:30 AM > *To: *"[email protected]" <[email protected]> > *Subject: *machine translation recommendation for use with Tika? > > > > Hi friends, > > I've been tasked with figuring out how to machine-translate a large set of > documents from a common European language into English, using a system that > already utilizes Tika. > > I know Tika integrates with a handful of machine-translation APIs > <https://tika.apache.org/1.14/api/org/apache/tika/language/translate/package-summary.html>. > Do you all have a sense of which works best, both in terms of translation > quality and ease of integration with Tika? > > (We know we're going to have to pay, but the amount of content won't be > huge, so differences in price aren't a big factor.) > > Thanks in advance, > > Jeremy B. Merrill > > > > > > >
