Hi Chris, Thank you, this is helpful. I think running our own system is out of the question, just on account of time (News just keeps on happening. Though it'd certainly would be fun to play with...) and -- presumably -- result quality.
Do you have thoughts on which of Google, Microsoft and Lingo24 might be easiest? Or are they all just as easy to use with Tika and I should just try 'em all out? Thanks, --- Jeremy B. Merrill The New York Times On Mon, Mar 20, 2017 at 1:43 PM, Mattmann, Chris A (3010) < [email protected]> wrote: > Hi Jeremy, > > > > Thanks for reaching out. > > > > So far I have had really good experience with the Lingo24 translator. It > really depends though > and is based on two families of what you are trying to do. For example, if > you want the widest, > most broad coverage and trained translation, Google, Microsoft, Lingo24, > fall into the remote > translation API service category. They all have tons of data, and > training. I also think all use > human curators for quality review of some things. All will eventually cost > you. I know that you > get some X million characters of translation a month in the services. > > > > On the other end is if you deploy your own Apache Joshua (incubating) > and/or Moses MT system, > and then have Tika connect to them as a service. In this case you control > the costs and can run it > on your own servers, etc, but you are limited by the quality of your > trained models, and your language > pairs. > > > > Does this make sense? > > > > Cheers, > > Chris > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Chris Mattmann, Ph.D. > > Principal Data Scientist, Engineering Administrative Office (3010) > > Manager, NSF & Open Source Projects Formulation and Development Offices > (8212) > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > Office: 180-503E, Mailstop: 180-503 > > Email: [email protected] > > WWW: http://sunset.usc.edu/~mattmann/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Director, Information Retrieval and Data Science Group (IRDS) > > Adjunct Associate Professor, Computer Science Department > > University of Southern California, Los Angeles, CA 90089 USA > > WWW: http://irds.usc.edu/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > *From: *"Merrill, Jeremy" <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Monday, March 20, 2017 at 8:30 AM > *To: *"[email protected]" <[email protected]> > *Subject: *machine translation recommendation for use with Tika? > > > > Hi friends, > > I've been tasked with figuring out how to machine-translate a large set of > documents from a common European language into English, using a system that > already utilizes Tika. > > I know Tika integrates with a handful of machine-translation APIs > <https://tika.apache.org/1.14/api/org/apache/tika/language/translate/package-summary.html>. > Do you all have a sense of which works best, both in terms of translation > quality and ease of integration with Tika? > > (We know we're going to have to pay, but the amount of content won't be > huge, so differences in price aren't a big factor.) > > Thanks in advance, > > Jeremy B. Merrill > > > > >
